Evaluation of Top-Down Mass Spectral Identification with Homologous Protein Sequences

Ziwei Li,Bo He,Qiang Kou,Zhe Wang,Si Wu,Yunlong Liu,Weixing Feng,Xiaowen Liu
DOI: https://doi.org/10.1186/s12859-018-2462-1
IF: 3.307
2018-01-01
BMC Bioinformatics
Abstract:Background Top-down mass spectrometry has unique advantages in identifying proteoforms with multiple post-translational modifications and/or unknown alterations. Most software tools in this area search top-down mass spectra against a protein sequence database for proteoform identification. When the species studied in a mass spectrometry experiment lacks its proteome sequence database, a homologous protein sequence database can be used for proteoform identification. The accuracy of homologous protein sequences affects the sensitivity of proteoform identification and the accuracy of mass shift localization. Results We tested TopPIC, a commonly used software tool for top-down mass spectral identification, on a top-down mass spectrometry data set of Escherichia coli K12 MG1655, and evaluated its performance using an Escherichia coli K12 MG1655 proteome database and a homologous protein database. The number of identified spectra with the homologous database was about half of that with the Escherichia coli K12 MG1655 database. We also tested TopPIC on a top-down mass spectrometry data set of human MCF-7 cells and obtained similar results. Conclusions Experimental results demonstrated that TopPIC is capable of identifying many proteoform spectrum matches and localizing unknown alterations using homologous protein sequences containing no more than 2 mutations.
What problem does this paper attempt to address?