How to train a post-processor for tandem mass spectrometry proteomics database search while maintaining control of the false discovery rate

Jack Andrew Freestone,Lukas Käll,William Stafford Noble,Uri Keich
DOI: https://doi.org/10.1101/2023.10.26.564068
2024-08-26
Abstract:Decoy-based methods are a popular choice for the statistical validation of peptide detections in tandem mass spectrometry proteomics data. Such methods can achieve a substantial boost in statistical power when coupled with post-processors such as Percolator that use auxiliary features to learn a better-discriminating scoring function. However, we recently showed that Percolator can struggle to control the false discovery rate (FDR) when reporting the list of discovered peptides. To address this problem, we introduce Percolator-RESET, which is an adaptation of our recently developed RESET meta-procedure to the peptide detection problem. Specifically, Percolator-RESET fuses Percolator's iterative SVM training procedure with RESET's general framework of determining the list of reported discoveries in a target-decoy competition setup, where each putative discovery is augmented with a list of relevant features. Percolator-RESET operates in both a standard single-decoy mode and a two-decoy mode, the latter requiring the generation of two decoys per target. We demonstrate that Percolator-RESET controls the FDR in both modes, both theoretically and empirically, while typically reporting only a marginally smaller number of discoveries than Percolator in single-decoy mode. The two-decoy mode is marginally more powerful than both Percolator and the single-decoy mode and exhibits less variability than the latter.
Bioinformatics
What problem does this paper attempt to address?