11:30 - 13:00
Wednesday Talks 2
Room: Salle Capitole-Daurade
Chair/s:
Paul THOMPSON
Submission 4
Deep Learning Boosts Citrullination Identification in Mass Spectrometry-Based Proteomics
Wednesday-Talks 2-Selected talk-04
Presented by: Chien-Yun Lee
Chien-Yun Lee 1, 2, Wassim Gabriel 1, 3, Rebecca Meelker Gonzalez 1, 2, Sophia Laposchan 1, 2, Mathias Wilhelm 1, 3
1 School of Life Sciences, Technical University of Munich, Germany
2 Young Investigator Group: Mass Spectrometry in Systems Neurosciences, Technical University of Munich, Germany
3 Computational Mass Spectrometry, Technical University of Munich, Germany
Detecting protein citrullination remains challenging due to their low abundance and limited enrichment tools. Direct identification of citrullination sites by high-resolution mass spectrometry has offered biological insights when enrichment is unavailable. While modern mass spectrometers are sensitive and accurate, errors derived from database searching remain dominant, especially with the same mass increase of deamidation (NQ). Manual inspection of candidate spectra becomes crucial to validate citrullination identification but hinders throughput in large-scale studies. Here, we present a precise and sensitive data analysis workflow to identify citrullination sites in proteomics datasets. This pipeline boosts identification by a deep learning model, Prosit-Cit, to predict the retention time and fragment ion intensities of citrullinated peptides. Prosit-Cit is the extension of Prosit trained by ~53,000 spectra derived from ~2,100 synthetic citrullinated peptides. Our workflow achieves high precision in identifying citrullination, evaluated with seven dilutions of 200 synthetic citrullinated peptides spiked into cellular tryptic digests. Re-analyzing ten human tissue proteomes using this workflow retrieved the most known sites and identified 5-10 times more citrullinated sites. Extending the search to the Arabidopsis tissue proteome dataset detected over 1,000 citrullination sites across 30 tissues, marking the first large-scale citrullination report in plants. Specifically, we found a higher citrullination level in flowers and its ubiquity across tissues, suggesting broader significance of citrullination in Arabidopsis. This work presents a precise and high-throughput workflow for large-scale citrullination identification, setting a benchmark as the first survey of protein citrullination in plants and enabling biological discoveries of protein citrullination in both new and existing proteomics datasets.