Support Vector Machine for Unbalanced Data Based on Sample Properties Under-Sampling Approaches

TAO Xin-min,HAO Si-yuan,ZHANG Dong-xue,LI Zhen
2013-01-01
Abstract:The classification result of classical support vector machine algorithm in the case of unbalanced data set is not satisfactory. Therefore, a under-sampling algorithm based on sample properties is presented. According to sample information in the kernel space, a certain percentage of majority instances located near the classification interface are selected. Then according to the sample’s density, the representive majority samples in the selected samples are selected, which can not only reduce the number of majority instances, but also make the SVM classification interface bias toward the majority instances. The experimental results show that compared with other data-preprocess methods for unbalanced dataset classification, the proposed method can improve the classification performance of SVM in the minority class data, the overall classification performance and robustness.
What problem does this paper attempt to address?