Predicting Protein Lysine Methylation Sites by Incorporating Single-Residue Structural Features into Chou's Pseudo Components

Hao Qiu,Yanzhi Guo,Lezheng Yu,Xuemei Pu,Menglong Li
DOI: https://doi.org/10.1016/j.chemolab.2018.05.007
IF: 4.175
2018-01-01
Chemometrics and Intelligent Laboratory Systems
Abstract:Identification of the methylated residues is helpful for us to understand the molecular mechanism of many biological processes. Currently, almost all existing computational methods for methylation site prediction are based on the protein sequences. However, the 3-D structures of proteins are more directly correlated with their biological properties than the sequences. Therefore, in view of few similar works have been done before, a novel method for predicting protein lysine methylation sites were firstly proposed based on single-residue structural features. Different from previous works extracting fragments with the methylated site in the center which contain several neighboring residues as samples, only the single methylated lysine site is considered as a sample in this paper. Then, on basis of the 3-D structures of methylated proteins, we gave a comprehensive feature representation for each methylated lysine by combing accessible surface area (ASA), protrusion index (CX) and depth index (DPX), secondary structure (SS), residue interaction network (RIN) and electrostatics potential (EP). All of these features can well characterize the environmental information of each methylated lysine, in other words, the structural information of the neighboring residues has been integrated into the features of it. According to our analysis, we suggest that it's more efficient to establish the model focusing on single sites than adding adjacent residues. The prediction model was assessed by the testing set and yielded a good performance with the sensitivity of 95.1% and specificity of 89.0%. Moreover, a common independent dataset was collected for further evaluating our model and other five existing sequence-based methods. The prediction results indicate that our method outperforms them and all experimentally confirmed methylated sites are successfully identified by our model. Finally, we conducted predictions on a proteomic scale in order to provide guidance for further experiments. All results indicate that our method can be a useful implement in identifying methylated lysine sites.
What problem does this paper attempt to address?