Identify and Analysis Crotonylation Sites in Histone by Using Support Vector Machines

Wang-Ren Qiu,Bi-Qian Sun,Hua Tang,Jian Huang,Hao Lin
DOI: https://doi.org/10.1016/j.artmed.2017.02.007
IF: 7.011
2017-01-01
Artificial Intelligence in Medicine
Abstract:Objective: Lysine crotonylation (Kcr) is a newly discovered histone posttranslational modification, which is specifically enriched at active gene promoters and potential enhancers in mammalian cell genomes. Although lysine crotonylation sites can be correctly identified with high-resolution mass spectrometry, the experimental methods are time-consuming and expensive. Therefore, it is necessary to develop computational methods to deal with this problem. Methods: We proposed a new encoding scheme named position weight amino acid composition to extract sequence information of histone around crotonylation sites. We chose protein data from Uniprot database. A series of steps were used to construct a strict and objective benchmark dataset for training and testing the proposed method. All samples were characterized by a significant number of features derived from position weight amino acid composition. The support vector machine was used to perform classification. Results: Based on a series of experiments, we found that the sensitivity (Sn), specificity (Sp), accuracy (Acc), and Matthew's correlation coefficient (MCC) were respectively 71.69%, 98.7%, 94.43%, and 0.778 in jackknife cross-validation. Comparison results demonstrated that our proposed model was better than random forest algorithm. We also performed the feature analysis on samples. Conclusion: Identification of the Kcr sites in histone is an indispensable step for decoding protein function. Therefore, the method can promote the deep understanding of the physiological roles of crotonylation and provide useful information for developing drugs to treat various diseases associated with crotonylation. (C) 2017 Elsevier B.V. All rights reserved.
What problem does this paper attempt to address?