Mgof-Loc: A Novel Ensemble Learning Method for Human Protein Subcellular Localization Prediction.

Leyi Wei,Minghong Liao,Xing Gao,Jingjing Wang,Weiqi Lin
DOI: https://doi.org/10.1016/j.neucom.2015.09.137
IF: 6
2016-01-01
Neurocomputing
Abstract:To better understand the functions of proteins, it is a critical step to predict their subcellular locations. Recently, numerous computational methods have been developed for protein subcellular localization prediction. Most of existing methods rely on the Gene Ontology (GO) information for feature representation. Although the GO information is proved to be beneficial for the improved predictive performance of the methods in prior research, the following problem is that it generates a super-high dimensional feature space, and the dimension of the feature space will get higher and higher as the number of the terms in the GO database increase. To address this issue, we propose a novel feature representation method sufficiently exploring the sequence evolutional information rather than using the GO information. Using the proposed feature representation method, we generate a comprehensive feature set of 828 features from the following three aspects: physicochemical properties, position-specific score matrix (PSSM), and the k-skip-n-gram model. By featuring a multi-label ensemble classifier with the proposed features, we further develop a novel multi-label learning method, namely mGOF-loc. Results on an updated large-scale dataset distributed with 37 subcellular locations show that mGOF-loc outperforms existing methods. Currently, a webserver that implements mGOF-loc is freely available on http://server.malab.cn/mGOF-loc/Index.html.
What problem does this paper attempt to address?