HydLoc: A Tool for Hydroxyproline and Hydroxylysine Sites Prediction in the Human Proteome

Qixing Huang,Xingyu Chen,Yang Wang,Jinlong Li,Haiyan Liu,Yun Xie,Zong Dai,Xiaoyong Zou,Zhanchao Li
DOI: https://doi.org/10.1016/j.chemolab.2020.104035
IF: 4.175
2020-01-01
Chemometrics and Intelligent Laboratory Systems
Abstract:As a kind of post-translational modifications, hydroxylation drew less attention than other modifications, such as phosphorylation and acetylation. However, besides protein stability regulation, it has been found that hydroxylation may affect the activity of proteins. Therefore, it is necessary to better understand the biological processes of hydroxylation. Identification of hydroxylated substrates and their corresponding sites is important for the studies of its molecular mechanism. Fast and convenient computational methods for hydroxylation sites identification are much desired, because experimental approaches are time-consuming and labor-intensive. Here, we present HydLoc (Hydroxylation sites Location), a random forest-based hydroxylation sites predictor for human proteins using sequential information and physicochemical properties. The accuracies of leave-one-out cross-validation on the training dataset are 84.25% and 80.61% for residue proline (P) and lysine (K), respectively. Based on the independent test dataset, it achieved an accuracy of 90.74% and 81.25% for P and K hydroxylation sites prediction, respectively. Meanwhile, the sensitivity values of 96.29% and 75.00% were obtained for residue P and K, which outperforms the existing methods. A user-friendly web server of HydLoc is now available at https://www.gdpu-bioinfolab.com/hydloc/
What problem does this paper attempt to address?