PredGly: Predicting Lysine Glycation Sites for Homo Sapiens Based on XGboost Feature Optimization

Jialin Yu,Shaoping Shi,Fang Zhang,Guodong Chen,Man Cao
DOI: https://doi.org/10.1093/bioinformatics/bty1043
IF: 5.8
2018-01-01
Bioinformatics
Abstract:MOTIVATION:Protein glycation is a familiar post-translational modification (PTM) which is a two-step non-enzymatic reaction. Glycation not only impairs the function but also changes the characteristics of the proteins so that it is related to many human diseases. It is still much more difficult to systematically detect glycation sites due to the glycated residues without crucial patterns. Computational approaches, which can filter supposed sites prior to experimental verification, can extremely increase the efficiency of experiment work. However, the previous lysine glycation prediction method uses a small number of training datasets. Hence, the model is not generalized or pervasive.RESULTS:By searching from a new database, we collected a large dataset in Homo sapiens. PredGly, a novel software, can predict lysine glycation sites for H.sapiens, which was developed by combining multiple features. In addition, XGboost was adopted to optimize feature vectors and to improve the model performance. Through comparing various classifiers, support vector machine achieved an optimal performance. On the basis of a new independent test set, PredGly outperformed other glycation tools. It suggests that PredGly can provide more instructive guidance for further experimental research of lysine glycation.AVAILABILITY AND IMPLEMENTATION:https://github.com/yujialinncu/PredGly.SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.
What problem does this paper attempt to address?