Leveraging Pretrained Deep Protein Language Model to Predict Peptide Collision Cross Section

Ayano Nakai-Kasai,Kosuke Ogata,Yasushi Ishihama,Toshiyuki Tanaka

DOI: https://doi.org/10.1101/2024.09.11.612388

2024-09-14

Abstract:Collision cross section (CCS) of peptide ions provides an important separation dimension in liquid chromatography/tandem mass spectrometry-based proteomics that incorporates ion mobility spectrometry (IMS), and its accurate prediction is the basis for advanced proteomics workflows. This paper describes novel experimental data and a novel prediction model for challenging CCS prediction tasks including longer peptides that tend to have higher charge states. The proposed model is based on a pretrained deep protein language model. While the conventional prediction model requires training from scratch, the proposed model enables training with less amount of time owing to the use of the pretrained model as a feature extractor. Results of experiments with the novel experimental data show that the proposed model succeeds in drastically reducing the training time while maintaining the same or even better prediction performance compared with the conventional method. Our approach presents the possibility of prediction in a greener manner of various peptide properties in proteomic liquid chromatography/tandem mass spectrometry experiments.

Bioinformatics

What problem does this paper attempt to address?

The paper aims to address the problem of predicting peptide ion Collision Cross Section (CCS). Specifically: - **Research Background**: In proteomics studies combining liquid chromatography/tandem mass spectrometry (LC/MS/MS) with ion mobility spectrometry (IMS), CCS provides an important separation dimension. Accurate prediction of CCS is crucial for advanced proteomics workflows. - **Existing Challenges**: Most current prediction methods perform poorly for long peptide segments (which typically have higher charge states) due to a lack of relevant data and effective prediction models. - **Proposed Method**: The authors propose a new method based on a pretrained deep protein language model—PPLN (Pretrained Protein Language Model-based Network). This method leverages features extracted from the pretrained model and combines them with other information (such as charge number and mass) to predict the CCS values of peptide ions. - **Experimental Results**: The new method not only outperforms traditional LS-MLR and bidirectional LSTM methods in terms of prediction accuracy but also significantly reduces training time. Additionally, PPLN excels in handling long peptide segments with high charge states. In summary, the paper introduces a pretrained deep protein language model to improve the accuracy of peptide ion CCS prediction and demonstrates the efficiency and accuracy of this method in practical applications.

Leveraging Pretrained Deep Protein Language Model to Predict Peptide Collision Cross Section

Deep learning the collisional cross sections of the peptide universe from a million experimental values

Deep learning the collisional cross sections of the peptide universe from a million training samples

Ionmob: a Python package for prediction of peptide collisional cross-section values

AlphaPeptDeep: a modular deep learning framework to predict peptide properties for proteomics

DeepIso: A Deep Learning Model for Peptide Feature Detection

ProPept-MT: A Multi-Task Learning Model for Peptide Feature Prediction

Test-Time Training for Deep MS/MS Spectrum Prediction Improves Peptide Identification.

Development of a Python-based electron ionization mass spectrometry amino acid and peptide fragment prediction model

Prediction of peptide mass spectral libraries with machine learning.

Accurate Prediction of Ion Mobility Collision Cross-Section Using Ion’s Polarizability and Molecular Mass with Limited Data

DeepACLSTM: deep asymmetric convolutional long short-term memory neural models for protein secondary structure prediction

Deep Learning Predicts Non-Normal Peptide FAIMS Mobility Distributions Directly from Sequence

Leveraging Machine Learning Models for Peptide-Protein Interaction Prediction

Predicting Peptide Ionization Efficiencies for Electrospray Ionization Mass Spectrometry Using Machine Learning

ProtTrans and Multi-Window Scanning Convolutional Neural Networks for the Prediction of Protein-Peptide Interaction Sites

Improving Protein-peptide Interface Predictions in the Low Data Regime

Improved Prediction Model of Protein and Peptide Toxicity by Integrating Channel Attention into a Convolutional Neural Network and Gated Recurrent Units

Predicting the Predicted: A Comparison of Machine Learning-Based Collision Cross-Section Prediction Models for Small Molecules

PepCNN deep learning tool for predicting peptide binding residues in proteins using sequence, structural, and language model features

Predicting multifunctional peptides based on a multi-scale ResNet model combined with channel attention mechanisms