iLys-Khib: Identify lysine 2-Hydroxyisobutyrylation sites using mRMR feature selection and fuzzy SVM algorithm

Zhe Ju,Shi-Yun Wang
DOI: https://doi.org/10.1016/j.chemolab.2019.06.009
IF: 4.175
2019-08-01
Chemometrics and Intelligent Laboratory Systems
Abstract:<p>As a new type of histone mark, lysine 2-Hydroxyisobutyrylation (K<sub>hib</sub>) is known to affect the association between histone and DNA. The accurate identification of K<sub>hib</sub> sites is significant for further exploration of the biological functions and molecular mechanisms of K<sub>hib</sub>. In this study, a novel bioinformatics tool named iLys-Khib is developed to predict K<sub>hib</sub> sites. Three kinds of effective features, amino acid factors, binary encoding, and the composition of k-spaced amino acid pairs are incorporated to encode K<sub>hib</sub> sites. And the maximum relevance minimum redundancy feature selection algorithm are adopted to remove the redundant features. Moreover, a fuzzy support vector machine algorithm is proposed to handle the noise problem in K<sub>hib</sub> sites training dataset. As illustrated by 10-fold cross-validation, the performance of iLys-Khib achieves a satisfactory performance with a Sensitivity of 74.48%, a Specificity of 65.77%, an Accuracy of 70.12% and a Matthew's correlation coefficient of 0.4040. Feature analysis shows that the polarity factor features play significant roles in the prediction of K<sub>hib</sub> sites. These analysis and prediction results might provide some clues for understanding the molecular mechanisms of K<sub>hib</sub>. A user-friendly web-server for iLys-Khib is available at <a href="http://bioinform.cn/iLys_Khib/">http://bioinform.cn/iLys_Khib/</a>.</p>
automation & control systems,computer science, artificial intelligence,instruments & instrumentation,statistics & probability,mathematics, interdisciplinary applications,chemistry, analytical
What problem does this paper attempt to address?