Abstract:In mass spectrometry-based proteomics, the identification and quantification of peptides and proteins is usually done using database search algorithms or spectral library matching. The use of deep learning algorithms can help improve the identification rates of peptides and proteins through the generation of high-fidelity theoretical spectrum which can be used as the basis of a more complete spectral library than those presently available. Current methods focus on predicting only backbone ions, such as y- and b-ions. However, the inclusion of non-backbone ions is necessary to truly improve spectral library matching. Here we focus on providing a user-friendly machine learning workflow, which we call mplete pectrum ictor (CoSpred). Using CoSpred users can create their own machine learning compatible training dataset and then train a Machine Learning model to predict both backbone and non-backbone ions. For the model a transformer encoder architecture is used to predict the complete MS/MS spectrum from a given peptide sequence. This model does not require background knowledge of fragment ion annotations or fragmentation rules. The model outputs the set of pairs ( , ) where is the m/z (mass-to-charge ratio) of a peak in the spectrum and is the intensity of the peak. The model presented here for validation was trained on the dataset available in the MassIVE data repository and shows superior performance in terms of various metrics (e.g. precision/recall for mass, cosine similarity for peak intensity, etc) between the true and predicted spectra. Furthermore, CoSpred can be used to create custom models that allow for accurate spectrum prediction for different experimental conditions. In addition to the transformer model provided in the package, the code is built modularly to allow for alternate ML models to be easily “plugged in”. The CoSpred workflow (preprocessing->training->inference) provides a path for state-of-art ML capabilities to be more accessible to proteomics scientists.

Tesorai Search: Large pretrained model boosts identifications in mass spectrometry proteomics without the need for Percolator.

AdaNovo: Adaptive De Novo Peptide Sequencing with Conditional Mutual Information

ProteinInferencer: Confident protein identification and multiple experiment comparison for large scale proteomics projects

Test-Time Training for Deep MS/MS Spectrum Prediction Improves Peptide Identification.

How to train a post-processor for tandem mass spectrometry proteomics database search while maintaining control of the false discovery rate

Faster graphical model identification of tandem mass spectra using peptide word lattices

Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry

Predicting peptide properties from mass spectrometry data using deep attention-based multitask network and uncertainty quantification

Transformer-based de novo peptide sequencing for data-independent acquisition mass spectrometry

DDA-BERT: leveraging transformer architecture pre-training for data-dependent acquisition mass spectrometry-based proteomics

Emergence of molecular structures from repository-scale self-supervised learning on tandem mass spectra

ProPept-MT: A Multi-Task Learning Model for Peptide Feature Prediction

yHydra: Deep Learning enables an Ultra Fast Open Search by Jointly Embedding MS/MS Spectra and Peptides of Mass Spectrometry-based Proteomics

Deep learning for peptide identification from metaproteomics datasets

DeepIso: A Deep Learning Model for Peptide Feature Detection

CoSpred: Machine learning workflow to predict tandem mass spectrum in proteomics

Towards Less Biased Data-driven Scoring with Deep Learning-Based End-to-end Database Search in Tandem Mass Spectrometry

Binomial probability distribution model-based protein identification algorithm for tandem mass spectrometry utilizing peak intensity information.

Biometric Sample Quality

Pre-trained Maldi Transformers improve MALDI-TOF MS-based prediction

Algorithmic study on mass spectrometry and proteomics