An Ensemble Classifier Based on Stacked Generalization for Predicting Membrane Protein Types

Lei Guo,Shunfang Wang,Zicheng Cao
DOI: https://doi.org/10.1109/cisp-bmei.2017.8302278
2017-01-01
Abstract:Membrane proteins exist in many organisms, whose types are very relevant to the function of biological organs. From the amino acid sequence point of view, this paper fused two feature representations, pseudo amino acid composition (PseAAC) and dipeptide composition (DipC), to form a new feature expression. Then a dimensionality reduction algorithm of liner discriminant analysis (LDA) is used to this new fusion representation. Last, a two-layer stacking evaluation model was constructed, which contains four base classifiers of SVM, KNN, RF and NN in layer-1 and a meta-classifier (multiple logistic regression, MLR) in layer-2, to predict the types of membrane protein. The final experimental results show the prediction performance of the proposed ensemble classifier is better than all base classifiers. The overall accuracies for jackknife test and independent dataset test are as high as 85.47% and 90.70% respectively, which paves a new way for future prediction of membrane protein types.
What problem does this paper attempt to address?