DDA-BERT: leveraging transformer architecture pre-training for data-dependent acquisition mass spectrometry-based proteomics

Jun A,Pu Liu,Yingying Sun,Jiaying Lin,Xiaofan Zhang,Zongxiang Nie,Yuqi Zhang,Ziyuan Xing,Yi Chen,Tiannan Guo
DOI: https://doi.org/10.1101/2024.11.13.623394
2024-11-15
Abstract:In data-dependent acquisition mass spectrometry (DDA-MS)-based proteomics, machine learning based rescoring methods are often employed to integrate multiple scores measuring quality of peptide-spectrum matches (PSMs) from different aspects. Existing rescoring tools face limitations incurred by manual feature extraction, shallow machine learning models, and limited training data. Here, we introduce DDA-BERT, a transformer-based end-to-end deep learning model trained with 95 million human spectra for PSM rescoring. DDA-BERT demonstrates superior performance across various sample types, mass spectrometer platforms, trace sample proteomics, and multiple species proteome data. It consistently outperforms state-of-the-art methods of its kind for DDA-MS spectra analysis with up to 103.5% increased protein group identifications. For single cell proteomics, DDA-BERT identifies up to 85.6% more protein groups compared to existing tools. In addition, DDA-BERT can be effectively extended to proteome of non-human species. DDA-BERT offers a robust and scalable solution for enhancing PSM rescoring in proteomics.
Bioinformatics
What problem does this paper attempt to address?