Prediction of carbamylated lysine sites based on the one-class k-nearest neighbor method.

Guohua Huang,You Zhou,Yuchao Zhang,Bi-Qing Li,Ning Zhang,Yu-Dong Cai
DOI: https://doi.org/10.1039/c3mb70195f
2013-01-01
Molecular BioSystems
Abstract:Protein carbamylation is one of the important post-translational modifications, which plays a pivotal role in a number of biological conditions, such as diseases, chronic renal failure and atherosclerosis. Therefore, recognition and identification of protein carbamylated sites are essential for disease treatment and prevention. Yet the mechanism of action of carbamylated lysine sites is still not realized. Thus it remains a largely unsolved challenge to uncover it, whether experimentally or theoretically. To address this problem, we have presented a computational framework for theoretically predicting and analyzing carbamylated lysine sites based on both the one-class k-nearest neighbor method and two-stage feature selection. The one-class k-nearest neighbor method requires no negative samples in training. Experimental results showed that by using 280 optimal features the presented method achieved promising performances of SN = 82.50% for the jackknife test on the training set, and SN = 66.67%, SP = 100.00% and MCC = 0.8097 for the independent test on the testing set, respectively. Further analysis of the optimal features provided insights into the mechanism of action of carbamylated lysine sites. It is anticipated that our method could be a potentially useful and essential tool for biologists to theoretically investigate carbamylated lysine sites.
What problem does this paper attempt to address?