Identifying Protein Arginine Methylation Sites Using Global Features of Protein Sequence Coupled with Support Vector Machine Optimized by Particle Swarm Optimization Algorithm

Yan Zhang,Lijuan Tang,Hongyan Zou,Qin Yang,Xinliang Yu,Jianhui Jiang,Hailong Wu,Ruqin Yu
DOI: https://doi.org/10.1016/j.chemolab.2015.05.011
IF: 4.175
2015-01-01
Chemometrics and Intelligent Laboratory Systems
Abstract:Protein methylation, which plays vital roles in signal transduction and many cellular processes, is one of the most common protein post-translation modifications. Identification of methylation sites is very helpful for understanding the fundamental molecular mechanism of the methylation related biological processes. In silico predictions of methylation sites have emerged to be a powerful approach for methylation identifying. They also facilitate the performance of downstream characterizations and site-specific investigations. Herein, we proposed a novel strategy for the prediction of methylation sites based on a combination of the pseudo amino acid composition (PseAAC) and protein chain description as global features of protein sequence. The global features of protein sequence comprehensively utilize amino acid composition information and sequence-order information, along with the physicochemical properties and structural characteristics of amino acid information. Support vector machine (SVM) is invoked to build the prediction model for methylation sites on the basis of the global features of protein sequence. Meanwhile, a global stochastic optimization technique, particle swarm algorithm (PSO) is employed for effectively searching the optimal parameters in SVM. The prediction accuracy, sensitivity, specificity and Matthew's correlation coefficient values of the independent prediction set are 98.11%, 96.23%, 100% and 96.30%, respectively. It obviously indicates that our method has sufficient prediction effect in identification of the protein arginine methylation sites. As a comparison, other predictors are also constructed based on different feature extracting and modeling strategies. The results show that the proposed method can greatly improve the performance of arginine methylation sites prediction.
What problem does this paper attempt to address?