Deep learning methods for de novo peptide sequencing

Wout Bittremieux,Varun Ananth,William E. Fondrie,Carlo Melendez,Marina Pominova,Justin Sanders,Bo Wen,Melih Yilmaz,William Stafford Noble
DOI: https://doi.org/10.26434/chemrxiv-2024-l6wnt
2024-05-27
Abstract:Protein tandem mass spectrometry data is most often interpreted by matching observed mass spectra to a protein database derived from the reference genome of the sample being analyzed. In many application domains, however, a relevant protein database is unavailable or incomplete, and in such settings de novo sequencing is required. Since the introduction of the DeepNovo algorithm in 2017, the field of de novo sequencing has been dominated by deep learning methods, which use large amounts of labeled mass spectrometry data to train multi-layer neural networks to translate from observed mass spectra to corresponding peptide sequences. Here, we describe these deep learning methods, outline procedures for evaluating their performance, and discuss the challenges in the field, both in terms of methods development and evaluation protocols.
Chemistry
What problem does this paper attempt to address?