Abstract:The potential of getting a significant number of false positives (FPs) in peptide-spectrum matches (PSMs) obtained by proteomic database search has been well-recognized. Among the attempts to assess FPs, the concomitant use of target and decoy databases is widely practiced. By adjusting filtering criteria, FPs and false discovery rate (FDR) can be controlled at a desired level. Although the target-decoy approach is gaining in popularity, subtle differences in decoy construction (e.g., reversing vs stochastic methods), rate calculation (e.g., total vs unique PSMs), or searching (separate vs composite) do exist among various implementations. In the present study, we evaluated the effects of these differences on FP and FDR estimations using a rat kidney protein sample and the SEQUEST search engine as an example. On the effects of decoy construction, we found that, when a single scoring filter (XCorr) was used, stochastic methods generated a higher estimation of FPs and FDR than sequence reversing methods, likely due to an increase in unique peptides. This higher estimation could largely be attenuated by creating decoy databases similar in effective size but not by a simple normalization with a unique-peptide coefficient. When multiple filters were applied, the differences seen between reversing and stochastic methods significantly diminished, suggesting multiple filterings reduce the dependency on how a decoy is constructed. For a fixed set of filtering criteria, FDR and FPs estimated by using unique PSMs were almost twice those using total PSMs. The higher estimation seemed to be dependent on data acquisition setup. As to the differences between performing separate or composite searches, in general, FDR estimated from the separate search was about three times that from the composite search. The degree of difference gradually decreased as the filtering criteria became more stringent. Paradoxically, the estimated true positives in separate search were higher when multiple filters were used. By analyzing a standard protein mixture, we demonstrated that the higher estimation of FDR and FPs in the separate search likely reflected an overestimation, which could be corrected with a simple merging procedure. Our study illustrates the relative merits of different implementations of the target-decoy strategy, which should be worth contemplating when large-scale proteomic biomarker discovery is to be attempted.

A Note on the False Discovery Rate of Novel Peptides in Proteogenomics

ProteinInferencer: Confident protein identification and multiple experiment comparison for large scale proteomics projects

FineFDR: Fine-grained Taxonomy-specific False Discovery Rates Control in Metaproteomics

Reinvestigating the Correctness of Decoy-Based False Discovery Rate Control in Proteomics Tandem Mass Spectrometry

Decoy-free Protein-Level False Discovery Rate Estimation

NovoBoard: a comprehensive framework for evaluating the false discovery rate and accuracy of de novo peptide sequencing

Decoy methods for assessing false positives and false discovery rates in shotgun proteomics.

A Theoretical Foundation of the Target-Decoy Search Strategy for False Discovery Rate Control in Proteomics

An algorithm for decoy-free false discovery rate estimation in XL-MS/MS proteomics

A New Strategy to Filter out False Positive Identifications of Peptides in SEQUEST Database Search Results

Assessment of false discovery rate control in tandem mass spectrometry analysis using entrapment

Identifying Novel Protein Phenotype Annotations by Hybridizing Protein–protein Interactions and Protein Sequence Similarities

Protein Functional Annotation of Simultaneously Improved Stability, Accuracy and False Discovery Rate Achieved by a Sequence-Based Deep Learning

MHCVision: estimation of global and local false discovery rate for MHC class I peptide binding prediction

Improved False Discovery Rate Estimation Procedure for Shotgun Proteomics

An Integrative Method for Identifying the Over-Annotated Protein-Coding Genes in Microbial Genomes.

Optimization of Filtering Criterion for SEQUEST Database Searching to Improve Proteome Coverage in Shotgun Proteomics

Proteogenomic Mapping for Structural Annotation of Prokaryote Genomes

moPepGen: Rapid and Comprehensive Identification of Non-canonical Peptides

False Discovery Rate Controlling Procedures with BLOSUM62 substitution matrix and their application to HIV Data