Prediction And Analysis Of Protein Methylarginine And Methyllysine Based On Multisequence Features

Le-Le Hu,Zhen Li,Kai Wang,Shen Niu,Xiao-He Shi,Yu-Dong Cai,Hai-Peng Li
DOI: https://doi.org/10.1002/bip.21645
2011-01-01
Biopolymers
Abstract:Protein methylation, one of the most important post-translational modifications, typically takes place on arginine or lysine residue. The reversible modification involves a series of basic cellular processes. Identification of methyl proteins with their sites will facilitate the understanding of the molecular mechanism of methylation. Besides the experimental methods, computational predictions of methylated sites are much more desirable for their convenience and fast speed. Here, we propose a method dedicated to predicting methylated sites of proteins. Feature selection was made on sequence conservation, physicochemical/biochemical properties, and structural disorder by applying maximum relevance minimum redundancy and incremental feature selection methods. The prediction models were built according to nearest the neighbor algorithm and evaluated by the jackknife cross-validation. We built 11 and 9 predictors for methylarginine and methyllysine, respectively, and integrated them to predict methylated sites. As a result, the average prediction accuracies are 74.25%, 77.02% for methylarginine and methyllysine training sets, respectively. Feature analysis suggested evolutionary information, and physicochemical/biochemical properties play important roles in the recognition of methylated sites. These findings may provide valuable information for exploiting the mechanisms of methylation. Our method may serve as a useful tool for biologists to find the potential methylated sites of proteins. (C) 2011 Wiley Periodicals, Inc. Biopolymers 95: 763-771, 2011.
What problem does this paper attempt to address?