Network Approaches for Shotgun Proteomics Data Analysis
Bing Zhang,Jing Li,David L. Tabb,Byung-Hoon Park
DOI: https://doi.org/10.1109/IJCBS.2009.74
2009-01-01
Abstract:Shotgun proteomics has emerged as a powerful technology for protein identification with remarkable applications in discovering disease biomarkers. Protein assembly and biological interpretation of the assembled protein lists are critical steps in shotgun proteomics data analysis. Although most biological functions arise from interactions among proteins, current protein assembly pipelines treat proteins as independent entities. Usually, only individual proteins with strong experimental evidence (confident proteins) are reported, while many possible proteins of potential biological interest are eliminated. In biomarker studies, this conservative assembly may prevent us from identifying important biomarker candidates. In this study, we have developed a protein interaction network-assisted complex-enrichment approach (CEA) to improve protein identification by taking into consideration the functional relationship among proteins as embedded in protein interaction networks. CEA is based on the assumption that an eliminated protein is more likely to be present in the original sample if it is a member of a complex for which other members have been confidently identified in the same sample. Using a mouse organ data set and a mouse breast cancer data set, we show that CEA significantly improves protein identification and biological interpretation in shotgun proteomics data. First, we demonstrated the accuracy of CEA through cross-validation studies. CEA achieved an accuracy of 0.90 with a sensitivity of 0.45 in the mouse organ data set. Secondly, applying CEA on the eliminated proteins rescued 171, 156 and 181 proteins in the brain, placenta, and lung samples respectively, corresponding to 12%, 11%, and 10% increases in protein identifications in each organ proteome. Rescued proteins were supported by existing literature or transcriptome profiling studies at similar levels as the confidently identified proteins and at a significantly higher level than the abandoned ones. Finally, in the mouse breast cancer data set, CEA increased protein identification by 8% and 23% in the tumor and normal tissues, respectively. Among the 95 rescued proteins in the tumor tissue, 95% and 33% had been reported in cancer- and breast cancer-related publications, including products from some well-known breast cancer genes such as Ctnnb1 and Top1. Moreover, CEA makes it possible to compare proteomes at a network level. Comparison of the normal and tumor tissue-specific sub-networks identified some important processes involved in tumor biogenesis and progression, such as “ apoptosis” “ cell adhesion” and “ Wnt receptor signaling pathway” et al. In conclusion, CEA is an accurate approach that can be easily incorporated into routine shotgun proteomics protein assembly pipelines to improve protein identification. In addition, CEA generates a network view of the proteins and helps reveal the modular organization of proteins that may underpin the molecular mechanisms of the disease.