APIR: Aggregating Universal Proteomics Database Search Algorithms for Peptide Identification with FDR Control

Yiling Elaine Chen,Xinzhou Ge,Kyla Woyshner,MeiLu McDermott,Antigoni Manousopoulou,Scott B. Ficarro,Jarrod A. Marto,Kexin Li,Leo David Wang,Jingyi Jessica Li
DOI: https://doi.org/10.1101/2021.09.08.459494
2024-03-21
Abstract:Advances in mass spectrometry (MS) have enabled high-throughput analysis of proteomes in biological systems. The state-of-the-art MS data analysis relies on database search algorithms to quantify proteins by identifying peptide-spectrum matches (PSMs), which convert mass spectra to peptide sequences. Different database search algorithms use distinct search strategies and thus may identify unique PSMs. However, no existing approaches can aggregate all user-specified database search algorithms with a guaranteed increase in the number of identified peptides and control on the false discovery rate (FDR). To fill in this gap, we propose a statistical framework, Aggregation of Peptide Identification Results (APIR), that is universally compatible with all database search algorithms. Notably, under an FDR threshold, APIR is guaranteed to identify at least as many, if not more, peptides as individual database search algorithms do. Evaluation of APIR on a complex proteomics standard shows that APIR outpowers individual database search algorithms and empirically controls the FDR. Real data studies show that APIR can identify disease-related proteins and post-translational modifications missed by some individual database search algorithms. The APIR framework is easily extendable to aggregating discoveries made by multiple algorithms in other high-throughput biomedical data analysis, e.g., differential gene expression analysis on RNA sequencing data. The APIR R package is available at .
Bioinformatics
What problem does this paper attempt to address?