Accurate in Silico Prediction of Species-Specific Methylation Sites Based on Information Gain Feature Optimization

Pingping Wen,Shao-Ping Shi,Haodong Xu,Lina Wang,Jian‐Ding Qiu
DOI: https://doi.org/10.1093/bioinformatics/btw377
IF: 5.8
2016-01-01
Bioinformatics
Abstract:As one of the most important reversible types of post-translational modification, protein methylation catalyzed by methyltransferases carries many pivotal biological functions as well as many essential biological processes. Identification of methylation sites is prerequisite for decoding methylation regulatory networks in living cells and understanding their physiological roles. Experimental methods are limitations of labor-intensive and time-consuming. While in silicon approaches are cost-effective and high-throughput manner to predict potential methylation sites, but those previous predictors only have a mixed model and their prediction performances are not fully satisfactory now. Recently, with increasing availability of quantitative methylation datasets in diverse species (especially in eukaryotes), there is a growing need to develop a species-specific predictor. Here, we designed a tool named PSSMe based on information gain (IG) feature optimization method for species-specific methylation site prediction. The IG method was adopted to analyze the importance and contribution of each feature, then select the valuable dimension feature vectors to reconstitute a new orderly feature, which was applied to build the finally prediction model. Finally, our method improves prediction performance of accuracy about 15% comparing with single features. Furthermore, our species-specific model significantly improves the predictive performance compare with other general methylation prediction tools. Hence, our prediction results serve as useful resources to elucidate the mechanism of arginine or lysine methylation and facilitate hypothesis-driven experimental design and validation. Availability and Implementation: The tool online service is implemented by C# language and freely available at http://bioinfo.ncu.edu.cn/PSSMe.aspx. Contact: jdqiu@ncu.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online.
What problem does this paper attempt to address?