Integrating de Novo Sequencing and Database Search for Peptide Identification

Lei Xin,Bin Ma,Baozhen Shan
2012-01-01
Journal of Biomolecular Techniques
Abstract:Introduction: Peptide identification with high sensitivity and accuracy is vital in mass spectrometry-based proteomics. One approach to increase confidence of peptide identification is through high resolution tandem mass spectrometry on both precursor and fragment steps. A workflow is resented to combine de novo sequencing and database search for peptide identification with high resolution data. METHODS: The workflow integrates de novo sequencing and database searching for peptide identification. It contains 3 steps. 1. Perform database search with all MS/MS spectra against protein sequence database. Database peptides were selected with 1% FDR. 2. For unidentified spectra in step 1, perform modification search using confident de novo tags and turning on all modifications in Unimod database. Peptides containing un-suspected modifications were selected with 1% FDR. 3. Select the spectra with high confident de novo sequences but not identified in above steps. RESULTS: The workflow was implemented in PEAKS. A high resolution MS dataset published by D.S. Kelkar on MCP was tested, in which 253394 MS/MS spectra were obtained from cell lysates of Mycobacterium tuberculosis with strong cation exchange chromatography on LTQ-Orbitrap Velos. 112480 peptide-spectrum matches (PSMs) were identified by database sequence searching in step 1, with 5 ppm precursor mass errors and 40 ppm of fragment mass errors. 28120 of 112480 spectra have de novo confidence scores (ALC) great than 70%. Compared with database peptides, the percentage of consistent amino acids for de novo sequences is 94%. 5706 PSMs were identified by modification search in step 2, with 5 ppm precursor mass errors and 45 ppm of fragment mass errors. In addition, 3976 PSMs with ALC great than 70% were selected in step 3, with 5 ppm precursor mass errors and 42 ppm of fragment mass errors. CONCLUSION: Integrating de novo sequencing and database search improves peptide identification.
What problem does this paper attempt to address?