Abstract:Accurate protein identification from mass spectrometry (MS) data is fundamental to unraveling the complex roles of proteins in biological systems, with peptide sequencing being a pivotal step in this process. The two main paradigms for peptide sequencing are database search, which matches experimental spectra with peptide sequences from databases, and sequencing, which infers peptide sequences directly from MS without relying on pre-constructed database. Although database search methods are highly accurate, they are limited by their inability to identify novel, modified, or mutated peptides absent from the database. In contrast, sequencing is adept at discovering novel peptides but often struggles with missing peaks issue, further leading to lower precision. We introduce SearchNovo, a novel framework that synergistically integrates the strengths of database search and sequencing to enhance peptide sequencing. SearchNovo employs an efficient search mechanism to retrieve the most similar peptide spectrum match (PSM) from a database for each query spectrum, followed by a fusion module that utilizes the reference peptide sequence to guide the generation of the target sequence. Furthermore, we observed that dissimilar (noisy) reference peptides negatively affect model performance. To mitigate this, we constructed pseudo reference PSMs to minimize their impact. Comprehensive evaluations on multiple datasets reveal that SearchNovo significantly outperforms state-of-the-art models. Also, analysis indicates that many retrieved spectra contain missing peaks absent in the query spectra, and the retrieved reference peptides often share common fragments with the target peptides. These are key elements in the recipe for the success of SearchNovo.

Protein identification with deep learning: from abc to xyz

AdaNovo: Adaptive De Novo Peptide Sequencing with Conditional Mutual Information

ProteinInferencer: Confident protein identification and multiple experiment comparison for large scale proteomics projects

Deep learning methods for de novo peptide sequencing

De novo peptide sequencing with InstaNovo: Accurate, database-free peptide identification for large scale proteomics experiments

NovoBench: Benchmarking Deep Learning-based De Novo Peptide Sequencing Methods in Proteomics

DeepNovoV2: Better de novo peptide sequencing with deep learning

DeepProtein: Deep Learning Library and Benchmark for Protein Sequence Learning

ContraNovo: A Contrastive Learning Approach to Enhance De Novo Peptide Sequencing

π-PrimeNovo: An Accurate and Efficient Non-Autoregressive Deep Learning Model for De Novo Peptide Sequencing

DeepIso: A Deep Learning Model for Peptide Feature Detection

MSNovo: a dynamic programming algorithm for de novo peptide sequencing via tandem mass spectrometry.

PowerNovo: de novo peptide sequencing via tandem mass spectrometry using an ensemble of transformer and BERT models

Transformer-based de novo peptide sequencing for data-independent acquisition mass spectrometry

Deep Learning Powers Protein Identification from Precursor MS Information

AdaNovo: Adaptive \emph{De Novo} Peptide Sequencing with Conditional Mutual Information

Deep learning neural network tools for proteomics

Bridging the Gap between Database Search and De Novo Peptide Sequencing with SearchNovo

Deep Learning in Proteomics

Deep learning-driven fragment ion series classification enables highly precise and sensitive de novo peptide sequencing

Protein remote homology detection and structural alignment using deep learning