AdaNovo: Adaptive De Novo Peptide Sequencing with Conditional Mutual Information

Jun Xia,Shaorong Chen,Jingbo Zhou,Tianze Ling,Wenjie Du,Sizhe Liu,Stan Z. Li
2024-01-01
Abstract:Tandem mass spectrometry has played a pivotal role in advancing proteomics,enabling the analysis of protein composition in biological samples. Despite thedevelopment of various deep learning methods for identifying amino acidsequences (peptides) responsible for observed spectra, challenges persist inde novo peptide sequencing. Firstly, prior methods struggle to identifyamino acids with post-translational modifications (PTMs) due to their lowerfrequency in training data compared to canonical amino acids, further resultingin decreased peptide-level identification precision. Secondly, diverse types ofnoise and missing peaks in mass spectra reduce the reliability of training data(peptide-spectrum matches, PSMs). To address these challenges, we proposeAdaNovo, a novel framework that calculates conditional mutual information (CMI)between the spectrum and each amino acid/peptide, using CMI for adaptive modeltraining. Extensive experiments demonstrate AdaNovo's state-of-the-artperformance on a 9-species benchmark, where the peptides in the training setare almost completely disjoint from the peptides of the test sets. Moreover,AdaNovo excels in identifying amino acids with PTMs and exhibits robustnessagainst data noise. The supplementary materials contain the official code.
What problem does this paper attempt to address?