Abstract:Tandem mass spectrometry has been the principal method in shotgun proteomics for peptide and protein identification. However, incorrect identifications reported by proteome search engines are still unknown, and further validation methods are needed. We have proposed a validation method pValid before, but its scope of application is limited because two features used in pValid are related to open database search and sub-optimal peptide candidates for tandem mass spectra, and the performance on complex datasets still has room for improvement. In this study, we developed a more comprehensive validation method, pValid 2, to break these limitations by removing the two features and bringing in a new feature related to the retention time predicted by a deep learning-based method pPredRT. pValid 2 yielded an average false positive rate of 0.03% and an average false negative rate of 1.37% on three testing datasets, better than those of pValid, and flagged 8.47% to 11.31% more incorrect identifications than pValid on two complex datasets. Moreover, pValid 2 flagged almost all decoy identifications in validating the open-search datasets. In addition, the function of validating identifications given by MaxQuant and MS-GF+ was implemented in pValid 2, and the validation results showed that pValid 2 performed dramatically better than three metabolic labeling validation methods. Further considering its cost-effectiveness as a pure computational approach, pValid 2 has the potential to be a widely used validation tool for peptide identifications of any proteome search engines in shotgun proteomics. SIGNIFICANCE: Identification results given by shotgun proteomics are vital to life science research. The correctness of identifications deeply affects the precision of the subsequent studies about protein structures and functions, protein-protein interactions, pathogenic mechanism, and targeted drugs. Thus, validating the correctness of identifications is crucial and urgent. In 2019, we developed an identification credibility validation method named pValid, whose false positive rate (FPR) is 0.03% and false negative rate (FNR) is 1.79%, comparable to those of the gold standard, i.e., the Synthetic-peptide validation method. However, pValid can only be used for validating the results from pFind, and its validation performance on a few complex datasets still has room for improvement. So, in this submission, we proposed pValid 2, a more comprehensive computational validation method that can validate identifications from any proteome search engines with increased discriminating power.

Automatic validation of phosphopeptide identifications by the MS2/MS3 target-decoy search strategy.

A Three-Stage Search Strategy Combining Database Reduction and Retention Time Filtering to Improve the Sensitivity of Low-Input and Single-Cell Proteomic Analysis.

Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry

ProteinInferencer: Confident protein identification and multiple experiment comparison for large scale proteomics projects

Identification of Phosphopeptides with Unknown Cleavage Specificity by a De Novo Sequencing Assisted Database Search Strategy.

Increased confidence in large-scale phosphoproteomics data by complementary mass spectrometric techniques and matching of phosphopeptide data sets

pValid 2: A deep learning based validation method for peptide identification in shotgun proteomics with increased discriminating power

Comprehensive and Reliable Phosphorylation Site Mapping of Individual Phosphoproteins by Combination of Multiple Stage Mass Spectrometric Analysis with a Target-Decoy Database Search.

Using the entrapment sequence method as a standard to evaluate key steps of proteomics data analysis process

Analysis of human serum phosphopeptidome by a focused database searching strategy.

Improvement Of The Quantification Accuracy And Throughput For Phosphoproteome Analysis By A Pseudo Triplex Stable Isotope Dimethyl Labeling Approach

Classification filtering strategy to improve the coverage and sensitivity of phosphoproteome analysis.

Reinvestigating the Correctness of Decoy-Based False Discovery Rate Control in Proteomics Tandem Mass Spectrometry

PhosphoScan: A Probability-Based Method for Phosphorylation Site Prediction Using MS2/MS3 Pair Information

Assessment of false discovery rate control in tandem mass spectrometry analysis using entrapment

How to train a post-processor for tandem mass spectrometry proteomics database search while maintaining control of the false discovery rate

An algorithm for decoy-free false discovery rate estimation in XL-MS/MS proteomics

Targeted Feature Detection for Data-Dependent Shotgun Proteomics

A New Strategy to Filter out False Positive Identifications of Peptides in SEQUEST Database Search Results

Improved False Discovery Rate Estimation Procedure for Shotgun Proteomics

Pfind 2.0: a Software Package for Peptide and Protein Identification Via Tandem Mass Spectrometry.