Computational Prediction of Protein Arginine Methylation Based on Composition–Transition–Distribution Features

Ruiyan Hou,Jin Wu,Lei Xu,Quan Zou,Yi-Jun Wu
DOI: https://doi.org/10.1021/acsomega.0c03972
IF: 4.1
2020-10-19
ACS Omega
Abstract:Arginine methylation is one of the most essential protein post-translational modifications. Identifying the site of arginine methylation is a critical problem in biology research. Unfortunately, biological experiments such as mass spectrometry are expensive and time-consuming. Hence, predicting arginine methylation by machine learning is an alternative fast and efficient way. In this paper, we focus on the systematic characterization of arginine methylation with composition–transition–distribution (CTD) features. The presented framework consists of three stages. In the first stage, we extract CTD features from 1750 samples and exploit decision tree to generate accurate prediction. The accuracy of prediction can reach 96%. In the second stage, the support vector machine can predict the number of arginine methylation sites with 0.36 <i>R</i>-squared. In the third stage, experiments carried out with the updated arginine methylation site data set show that utilizing CTD features and adopting random forest as the classifier outperform previous methods. The accuracy of identification can reach 82.1 and 82.5% in single methylarginine and double methylarginine data sets, respectively. The discovery presented in this paper can be helpful for future research on arginine methylation.This article has not yet been cited by other publications.
chemistry, multidisciplinary
What problem does this paper attempt to address?