Abstract:Tandem mass spectrometry has emerged to be one of the most powerful high-throughput techniques for protein identification and proteomics study. We aimed to improve existing algorithms and develop new algorithms for tandem mass spectrometry data analysis. We carried out three studies: (1) De novo peptide sequencing via tandem mass spectrometry is of interest in various situations. We developed a dynamic-programming-based suboptimal algorithm for de novo peptide sequencing. We transform an experimental spectrum into a matrix spectrum graph. We then give a polynomial time suboptimal algorithm that finds all the suboptimal solutions (candidate peptide sequences). The algorithm has been implemented and tested on experimental data; (2) A major known problem for protein and peptide identification using mass spectrometry database search is that the speed of database search is too slow, especially when searching against a large sequence database. To cope with this situation, we designed speedup algorithms for the searching process. We employed an approach combining suffix tree data structure and spectrum graph. The basic idea is to use the suffix tree data structure to capture repeat information in the protein database and use the spectrum graph to eliminate peptide candidates so that the correct peptide can be selected more easily by a scoring function. The algorithms can be further extended for database search with post-translational modifications. The algorithms were implemented and tested on experimental data; (3) The biological inference from proteomics data generated by mass spectrometers is a challenging problem. In this study, we first introduce some difficult issues in proteomics data analysis and then we show how we managed and visualized proteomics data by using both in-house and publicly available software tools.

A multi-species benchmark for training and validating mass spectrometry proteomics machine learning models

AdaNovo: Adaptive De Novo Peptide Sequencing with Conditional Mutual Information

A Novel Spectral Library Workflow to Enhance Protein Identifications

MassSpecGym: A benchmark for the discovery and identification of molecules

Machine learning strategies to tackle data challenges in mass spectrometry-based proteomics

NovoBench: Benchmarking Deep Learning-based De Novo Peptide Sequencing Methods in Proteomics

Test-Time Training for Deep MS/MS Spectrum Prediction Improves Peptide Identification.

Multispecies Benchmark Analysis for LC-MS/MS Validation and Performance Evaluation in Bottom-Up Proteomics

Tesorai Search: Large pretrained model boosts identifications in mass spectrometry proteomics without the need for Percolator.

Algorithmic study on mass spectrometry and proteomics

A nested mixture model for protein identification using mass spectrometry

AIomics: exploring more of the proteome using mass spectral libraries extended by AI

UniSpec: Deep Learning for Predicting the Full Range of Peptide Fragment Ion Series to Enhance the Proteomics Data Analysis Workflow

Spectrum Identification using a Dynamic Bayesian Network Model of Tandem Mass Spectra

Binomial probability distribution model-based protein identification algorithm for tandem mass spectrometry utilizing peak intensity information.

A cross-platform toolkit for mass spectrometry and proteomics

Efficient discovery of abundant post-translational modifications and spectral pairs using peptide mass and retention time differences

Open Mass Spectrometry Search Algorithm

Combining mass spectrometry and machine learning to discover bioactive peptides

MSpectraAI: a powerful platform for deciphering proteome profiling of multi-tumor mass spectrometry data by using deep neural networks

A likelihood-based scoring method for peptide identification using mass spectrometry