Abstract:Phosphorylation, as one of the most important post-translational modifications, plays a key role in various cellular physiological processes and disease occurrences. In recent years, computer technology has been gradually applied to the prediction of protein phosphorylation sites. However, most existing methods rely on simple protein sequence features that provide limited contextual information. To overcome this limitation, we propose DeepMPSF, a phosphorylation site prediction model based on multiple protein sequence features. There are two types of features: sequence semantic features, which comprise protein residue type information and relative position information within protein sequence, and protein background biophysical features, which include global semantic information containing more comprehensive protein background information obtained from pretrained models. To extract these features, DeepMPSF employs two separate subnetworks: the S71SFE module and the BBFE module, which automatically extract high-level semantic features. Our model incorporates a learning strategy for handling imbalanced datasets through ensemble learning during training and prediction. DeepMPSF is trained and evaluated on a well-established dataset of human proteins. Comparing the analysis with other benchmark methods reveals that DeepMPSF outperforms in predicting both S/T residues and Y residues. In particular, DeepMPSF showed excellent generalization performance in cross-species blind test performance, with an average improvement of 5.63%/5.72%, 22.28%/25.94%, 20.11%/17.49%, and 26.40%/28.33% for <i>Mus musculus</i>/<i>Rattus norvegicus</i> test sets in area under curves (AUCs) of ROC curve, AUC of the PR curve, F1-score, and MCC metrics, respectively. Furthermore, it also shows excellent performance in the latest updated case of natural proteins with functional phosphorylation sites. Through an ablation study and visual analysis, we uncover that the design of different feature modules significantly contributes to the accurate classification of DeepMPSF, which provides valuable insights for predicting phosphorylation sites and offers effective support for future downstream research.

Structure-Based Prediction of Protein Phosphorylation Sites Using an Ensemble Approach

PredPhos: an Ensemble Framework for Structure-Based Prediction of Phosphorylation Sites

DeepMPSF: A Deep Learning Network for Predicting General Protein Phosphorylation Sites Based on Multiple Protein Sequence Features

PhosAF: an Integrated Deep Learning Architecture for Predicting Protein Phosphorylation Sites with AlphaFold2 Predicted Structures

DeepPhos: prediction of protein phosphorylation sites with deep learning.

A Novel Method for Predicting Protein Phosphorylation Via Site-Modification Network Profiles

General Phosphorylation Site Prediction Model Based on Attention Mechanism

Computational Prediction and Analysis of Species-Specific Fungi Phosphorylation Via Feature Optimization Strategy

Phosphorylation Site Prediction Integrating The Position Feature With Sequence Evolution Information

Prediction of Protein Phosphorylation Sites by Integrating Secondary Structure Information and Other One-Dimensional Structural Properties

Leveraging Protein Dynamics to Identify Functional Phosphorylation Sites using Deep Learning Models

Prediction of kinase-specific phosphorylation sites with sequence features by a log-odds ratio approach.

Prediction of Protein Kinase-Specific Phosphorylation Sites in Hierarchical Structure Using Functional Information and Random Forest

Phosformer: an explainable transformer model for protein kinase-specific phosphorylation predictions

Phosphorylation Site Prediction with A Modified K-Nearest Neighbor Algorithm and BLOSUM62 Matrix

PhosIDN: an integrated deep neural network for improving protein phosphorylation site prediction by combining sequence and protein–protein interaction information

A Novel Network-Based Computational Method to Predict Protein Phosphorylation on Tyrosine Sites.

Phosphopredict: A Bioinformatics Tool for Prediction of Human Kinase-Specific Phosphorylation Substrates and Sites by Integrating Heterogeneous Feature Selection

Phosphorylation Site Prediction Based on k-Nearest Neighbor Algorithm and BLOSUM62 Matrix

Phosphate Binding Sites Prediction in Phosphorylation-Dependent Protein-Protein Interactions

Prediction of Kinase-Specific Phosphorylation Sites by One-Class SVMs