Bayesian Nonparametric Model for the Validation of Peptide Identification in Shotgun Proteomics

Jiyang Zhang,Jie Ma,Lei Dou,Songfeng Wu,Xiaohong Qian,Hongwei Xie,Yunping Zhu,Fuchu He
DOI: https://doi.org/10.1074/mcp.M700558-MCP200
IF: 7.381
2009-01-01
Molecular & Cellular Proteomics
Abstract:Tandem mass spectrometry combined with database searching allows high throughput identification of peptides in shotgun proteomics. However, validating database search results, a problem with a lot of solutions proposed, is still advancing in some aspects, such as the sensitivity, specificity, and generalizability of the validation algorithms. Here a Bayesian nonparametric (BNP) model for the validation of database search results was developed that incorporates several popular techniques in statistical learning, including the compression of feature space with a linear discriminant function, the flexible nonparametric probability density function estimation for the variable probability structure in complex problem, and the Bayesian method to calculate the posterior probability. Importantly the BNP model is compatible with the popular target-decoy database search strategy naturally. We tested the BNP model on standard proteins and real, complex sample data sets from multiple MS platforms and compared it with Peptide-Prophet, the cutoff-based method, and a simple nonparametric method (proposed by us previously). The performance of the BNP model was shown to be superior for all data sets searched on sensitivity and generalizability. Some high quality matches that had been filtered out by other methods were detected and assigned with high probability by the BNP model. Thus, the BNP model could be able to validate the database search results effectively and extract more information from MS/MS data. Molecular & Cellular Proteomics 8:547-557, 2009.
What problem does this paper attempt to address?