Algorithmic study on mass spectrometry and proteomics

Ting Chen,Bingwen Lu
2005-01-01
Abstract:Tandem mass spectrometry has emerged to be one of the most powerful high-throughput techniques for protein identification and proteomics study. We aimed to improve existing algorithms and develop new algorithms for tandem mass spectrometry data analysis. We carried out three studies: (1) De novo peptide sequencing via tandem mass spectrometry is of interest in various situations. We developed a dynamic-programming-based suboptimal algorithm for de novo peptide sequencing. We transform an experimental spectrum into a matrix spectrum graph. We then give a polynomial time suboptimal algorithm that finds all the suboptimal solutions (candidate peptide sequences). The algorithm has been implemented and tested on experimental data; (2) A major known problem for protein and peptide identification using mass spectrometry database search is that the speed of database search is too slow, especially when searching against a large sequence database. To cope with this situation, we designed speedup algorithms for the searching process. We employed an approach combining suffix tree data structure and spectrum graph. The basic idea is to use the suffix tree data structure to capture repeat information in the protein database and use the spectrum graph to eliminate peptide candidates so that the correct peptide can be selected more easily by a scoring function. The algorithms can be further extended for database search with post-translational modifications. The algorithms were implemented and tested on experimental data; (3) The biological inference from proteomics data generated by mass spectrometers is a challenging problem. In this study, we first introduce some difficult issues in proteomics data analysis and then we show how we managed and visualized proteomics data by using both in-house and publicly available software tools.
What problem does this paper attempt to address?