Research on Classification Method of High-Dimensional Class-Imbalanced Data Sets Based on SVM

Chunkai Zhang,Ying Zhou,Jianwei Guo,Guoquan Wang,Xuan Wang
DOI: https://doi.org/10.1007/s13042-018-0853-2
2018-01-01
International Journal of Machine Learning and Cybernetics
Abstract:High-dimensional problems result in bad classification results because some combinations of features have an adverse effect on classification; while class-imbalanced problems make the classifier to concern the majority class more but the minority less, because the number of samples of majority class is more than minority class. The problem of both high-dimensional and class-imbalanced classification is found in many fields such as bioinformatics, healthcare and so on. Many researchers study either the high-dimensional problem or class-imbalanced problem and come up with a series of algorithms, but they ignore the above new problem, which indicates high-dimensional problems affect sampling process while class-imbalanced problems interfere feature selection. Firstly, this paper analyses the new problem arising from the mutual influence of the two problems, and then introduces SVM and analyses its advantages in dealing high-dimensional problem and class-imbalanced problem. Next, this paper proposes a new algorithm named BRFE-PBKS-SVM aimed at high-dimensional class-imbalanced datasets, which improves SVM-RFE by considering the class-imbalanced problem in the process of feature selection, and it also improves SMOTE so that the procedure of over-sampling could work in the Hilbert space with an adaptive over-sampling rate by PSO. Finally, the experimental results show the performance of this algorithm.
What problem does this paper attempt to address?