Membrane Protein Type Prediction for High-Dimensional Imbalanced Datasets

Lei Guo,Shunfang Wang
DOI: https://doi.org/10.1109/itme.2018.00190
2018-01-01
Abstract:Research on membrane protein type prediction is of great significance, because the type of membrane protein is exceedingly related to its function. In this paper, a new method is proposed for prediction of membrane protein types. Firstly, two kinds of feature extraction methods, PseAAC and PsePSSM, are used to extract the information from the membrane protein samples. Secondly, feature selection based on random forest is used to eliminate redundant features and reduce the computational complexity. The problem of imbalanced dataset was solved by using the resampling technique which combine SMOTE and Tomek Link. Finally, an experimental comparative study was performed on the membrane protein dataset. The results show that this method can effectively improve the accuracy of membrane protein prediction.
What problem does this paper attempt to address?