Sitetack: A Deep Learning Model that Improves PTM Prediction by Using Known PTMs

Clair S. Gutierrez,Alia A. Kassim,Benjamin D. Gutierrez,Ronald T. Raines

DOI: https://doi.org/10.1101/2024.06.03.596298

2024-06-04

Abstract:Post-translational modifications (PTMs) increase the diversity of the proteome and are vital to organismal life and therapeutic strategies. Deep learning has been used to predict PTM locations. Still, limitations in datasets and their analyses compromise success. Here we evaluate the use of known PTM sites in prediction via sequence-based deep learning algorithms. Specifically, PTM locations were encoded as a separate amino acid before sequences were encoded via word embedding and passed into a convolutional neural network that predicts the probability of a modification at a given site. Without labeling known PTMs, our model is on par with others. With labeling, however, we improved significantly upon extant models. Moreover, knowing PTM locations can increase the predictability of a different PTM. Our findings highlight the importance of PTMs for the installation of additional PTMs. We anticipate that including known PTM locations will enhance the performance of other proteomic machine learning algorithms.

Bioinformatics

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to improve the prediction of other post - translational modification (PTM) sites by deep - learning models through the utilization of known PTM site information. Specifically, the authors developed a deep - learning model named Sitetack, aiming to improve the accuracy of predicting different types of PTM sites by encoding known PTM sites as individual amino acids and combining sequence information. ### Main problems and solutions 1. **Limitations of existing methods**: - Although deep - learning has been used to predict PTM sites, due to the limitations of data sets and their analysis, the success rate of prediction is still not high. - Existing computational methods rarely systematically evaluate how PTMs affect the prediction of the same type or other types of PTM sites. 2. **Introduction of known PTM site information**: - The authors hypothesized that incorporating known PTM site information into model training could significantly improve prediction performance. - The specific method is to encode known PTM sites as special amino acid symbols (such as "@" or "&"), and then input them together with the protein sequence into a convolutional neural network (CNN) for training. 3. **Verification and improvement**: - The authors carried out extensive experiments through data sets of multiple PTM types (such as phosphorylation, N - glycosylation, O - glycosylation, etc.), verifying this hypothesis. - The results show that in most cases, the model containing known PTM site information performs significantly better than the model without this information. ### Key findings - **Performance improvement**: Among multiple PTM types, especially phosphorylation and hydroxylation, the model containing known PTM site information shows a significant performance improvement. For example, for the human phosphorylation model, the AUC is improved from 0.881 to 0.931. - **Interactions between PTMs**: The study also found that there are cross - influences between certain PTMs. For example, the presence of phosphorylation sites can improve the prediction accuracy of O - GlcNAc glycosylation. - **Specific kinase models**: For the phosphorylation prediction of specific kinases, the model containing known phosphorylation site information also shows better performance. ### Summary This paper shows that by introducing known PTM site information, the prediction ability of deep - learning models for PTM sites can be significantly improved. This not only helps to predict protein post - translational modifications more accurately, but also provides a new perspective for further understanding the interactions between PTMs. In addition, the authors also developed a free online tool ([Sitetack](https://sitetack.net)) to facilitate researchers to use these improved models for prediction.

Sitetack: A Deep Learning Model that Improves PTM Prediction by Using Known PTMs

DeepPTM: Protein Post-translational Modification Prediction from Protein Sequences by Combining Deep Protein Language Model with Vision Transformers

Prediction and Analysis of Multiple Protein Lysine Modified Sites Based on Conditional Wasserstein Generative Adversarial Networks

Validation of an Abbreviated Pharmacokinetic Profile for the Estimation of Mycophenolic Acid Exposure in Pediatric Renal Transplant Recipients*

[Injectable contraceptives. 10 years' clinical experience].

predML-Site: Predicting Multiple Lysine PTM Sites With Optimal Feature Representation and Data Imbalance Minimization

PCB mass transfer coefficients determined by application of a water surface sampler.

Improving PTM Site Prediction by Coupling of Multi-Granularity Structure and Multi-Scale Sequence Representation

PTM-ssMP: A Web Server for Predicting Different Types of Post-translational Modification Sites Using Novel Site-specific Modification Profile

Leveraging Protein Dynamics to Identify Functional Phosphorylation Sites using Deep Learning Models

A Novel Method for Predicting Post-Translational Modifications on Serine and Threonine Sites by Using Site-Modification Network Profiles

Prediction of Post-Translational Modification Sites Using Multiple Kernel Support Vector Machine.

Combining machine learning with structure-based protein design to predict and engineer post-translational modifications of proteins

TransPTM: a transformer-based model for non-histone acetylation site prediction

Progresses in Predicting Post-translational Modification

MIND-S is a Deep-Learning Prediction Model for Elucidating Protein Post-Translational Modifications in Human Diseases.

cytogenetic and germ cell effects of phosphine inhalation by rodents: II. subacute exposures to rats and mice

MeToken: Uniform Micro-environment Token Boosts Post-Translational Modification Prediction

PSSM-Sumo: deep learning based intelligent model for prediction of sumoylation sites using discriminative features

Computational identification of multiple lysine PTM sites by analyzing the instance hardness and feature importance

Systematic Characterization and Prediction of Post-Translational Modification Cross-Talk.