Applications of bioinformatics and machine learning in the analysis of proteomics data
,Bohui Li
DOI: https://doi.org/10.33540/1818
2022-12-10
Abstract:In chapter one, a general introduction to the basic principles and techniques of MS-based proteomics, quantification strategies, and a generalized shotgun proteomics workflow are given. Moreover, I also outline how to analyze proteomics data from a bioinformatics perspective including normalization, dealing with missing values, differential analysis, functional annotation, as well as how to reveal the biology from post-translational modification data. Furthermore, I generalized the basics of machine learning algorithms from the perspective of supervised and unsupervised machine learning, along with that the application of machine learning algorithms to the identification of protein complexes. In chapter two, we are seeking to explore the drug addiction mechanism in melanoma cells that carry BRAF mutation. We present a proteomics and phosphoproteomics study of BRAFi-addicted melanoma cells (i.e., 451Lu cell line) in response to BRAFi withdrawal, in which ERK1, ERK2, and JUNB were genetically silenced separately using CRISPR-Cas9. We show that inactivation of ERK2 and, to a lesser extent, JUNB prevents drug addiction in these melanoma cells, while, conversely, knockout of ERK1 fails to reverse this phenotype, showing a response similar to that of control cells. Our data indicate that ERK2 and JUNB share comparable proteome responses dominated by the reactivation of cell division. Importantly, we find that EMT activation in drug-addicted melanoma cells upon drug withdrawal is affected by silencing ERK2 but not ERK1. Moreover, we reveal that PIR acts as an effector of ERK2, and phosphoproteome analysis reveals that silencing of ERK2 but not ERK1 leads to the amplification of GSK3 kinase activity. Our results depict possible mechanisms of drug addiction in melanoma, which may provide a guide for therapeutic strategies in drug-resistant melanoma. In chapter three, we are dedicated to exploring the role of PD-1 in T cell activation by comparing the proteome and phosphoproteome profiles in resting and activated CD8+ T cells, in which PD-1 was silenced using CRISPR–Cas9. Our data reveal that the activated T cells reprogrammed their proteome and phosphoproteome marked by activating of mTORC1 pathway. Moreover, we find that silencing of PD-1 altered the expression of E3 ubiquitin-- protein ligases, and increased glucose and lactate transporters. On the phosphoproteomics level, it evokes phosphorylation events in the mTORC1 pathway and activates the epidermal growth factor and its downstream MAPK pathway. Therefore, the data presented in this chapter depicts mechanisms of PD-1 in response to TCR stimulation in CD8+ T cells, which may provide a guide in immune homeostasis and immune checkpoint therapy. In chapter four, we construct a comprehensive map of human protein complexes through the integration of protein-protein interactions and protein abundance features. A deep learning framework was built to predict protein-protein interactions (PPIs), followed by a two-stage clustering to identify protein complexes. Our deep learning technique-based classifier significantly outperformed recently published machine learning prediction models with an F1-measure of 0.68 and captured in the process 5,010 complexes containing over 9,000 unique proteins. Moreover, this deep learning model enables us to capture poorly characterized interactions and the co-expressed protein involved interactions.