Leveraging Pretrained Deep Protein Language Model to Predict Peptide Collision Cross Section

Ayano Nakai-Kasai,Kosuke Ogata,Yasushi Ishihama,Toshiyuki Tanaka
DOI: https://doi.org/10.1101/2024.09.11.612388
2024-09-14
Abstract:Collision cross section (CCS) of peptide ions provides an important separation dimension in liquid chromatography/tandem mass spectrometry-based proteomics that incorporates ion mobility spectrometry (IMS), and its accurate prediction is the basis for advanced proteomics workflows. This paper describes novel experimental data and a novel prediction model for challenging CCS prediction tasks including longer peptides that tend to have higher charge states. The proposed model is based on a pretrained deep protein language model. While the conventional prediction model requires training from scratch, the proposed model enables training with less amount of time owing to the use of the pretrained model as a feature extractor. Results of experiments with the novel experimental data show that the proposed model succeeds in drastically reducing the training time while maintaining the same or even better prediction performance compared with the conventional method. Our approach presents the possibility of prediction in a greener manner of various peptide properties in proteomic liquid chromatography/tandem mass spectrometry experiments.
Bioinformatics
What problem does this paper attempt to address?
The paper aims to address the problem of predicting peptide ion Collision Cross Section (CCS). Specifically: - **Research Background**: In proteomics studies combining liquid chromatography/tandem mass spectrometry (LC/MS/MS) with ion mobility spectrometry (IMS), CCS provides an important separation dimension. Accurate prediction of CCS is crucial for advanced proteomics workflows. - **Existing Challenges**: Most current prediction methods perform poorly for long peptide segments (which typically have higher charge states) due to a lack of relevant data and effective prediction models. - **Proposed Method**: The authors propose a new method based on a pretrained deep protein language model—PPLN (Pretrained Protein Language Model-based Network). This method leverages features extracted from the pretrained model and combines them with other information (such as charge number and mass) to predict the CCS values of peptide ions. - **Experimental Results**: The new method not only outperforms traditional LS-MLR and bidirectional LSTM methods in terms of prediction accuracy but also significantly reduces training time. Additionally, PPLN excels in handling long peptide segments with high charge states. In summary, the paper introduces a pretrained deep protein language model to improve the accuracy of peptide ion CCS prediction and demonstrates the efficiency and accuracy of this method in practical applications.