EMBER: Multi-label prediction of kinase-substrate phosphorylation events through deep learning

Kathryn E. Kirchoff,Shawn M. Gomez
DOI: https://doi.org/10.1101/2020.02.04.934216
2020-02-05
Abstract:Abstract Kinase-catalyzed phosphorylation of proteins forms the back-bone of signal transduction within the cell, enabling the coordination of numerous processes such as the cell cycle, apoptosis, and differentiation. While on the order of 10 5 phosphorylation events have been described, we know the specific kinase performing these functions for less than 5% of cases. The ability to predict which kinases initiate specific individual phosphorylation events has the potential to greatly enhance the design of downstream experimental studies, while simultaneously creating a preliminary map of the broader phosphorylation network that controls cellular signaling. To this end, we describe EMBER, a deep learning method that integrates kinase-phylogeny information and motif-dissimilarity information into a multi-label classification model for the prediction of kinase-motif phosphorylation events. Unlike previous deep learning methods that perform single-label classification, we restate the task of kinase-motif phosphorylation prediction as a multi-label problem, allowing us to train a single unified model rather than a separate model for each of the 134 kinase families. We utilize a Siamese network to generate novel vector representations, or an embedding, of motif sequences, and we compare our novel embedding to a previously proposed peptide embedding. Our motif vector representations are used, along with one-hot encoded motif sequences, as input to a classification network while also leveraging kinase phylogenetic relationships into our model via a kinase phylogeny-weighted loss function. Results suggest that this approach holds significant promise for improving our map of phosphorylation relations that underlie kinome signaling. Availability https://github.com/gomezlab/EMBER
What problem does this paper attempt to address?