A Classification Model For Class Imbalance Problem In Protein Subnuclear Localization

Liwen Wu,Yi Xiang,Yikun Yang,Gang Xue,Shaowen Yao,Xin Jin
DOI: https://doi.org/10.1109/CISP-BMEI.2018.8633252
2018-01-01
Abstract:Class imbalance is a common problem in protein subnuclear localization, which can seriously affect the classification performance. However, few studies have focused on the problem of class imbalance in protein subnuclear localization. In this paper, we propose a novel method to solve the class imbalance problem based on the oversampling method. Firstly, the features of protein are captured by PSSM, and then the KPCA is used to extract the valuable information of protein in PSSM. Next, the proposed method named M-SMOTE is used to generate new samples of minority class and eliminate the imbalance in different class. Finally, the processed samples are input into the Random Forest classifier to predict the protein subnuclear localization. The experiments results obtained by jackknife test indicate that the M-SMOTE can effectively solve the class imbalance problem and improve the classification performance in protein subnuclear localization.
What problem does this paper attempt to address?