An Improved Algorithm for Imbalanced Data and Small Sample Size Classification
Hu ong,Guo Dongfa,Fan Zengwei,Dong Chen,Huang Qiuhong,Xie Shengkai,Liu Guifang,Tan Jing,Li Boping,Xie Qiwei,Yong Hu,Dongfa Guo,Zengwei Fan,Chen Dong,Qiuhong Huang,Shengkai Xie,Guifang Liu,Jing Tan,Boping Li,Qiwei Xie
DOI: https://doi.org/10.4236/jdaip.2015.33004
2015-01-01
Journal of Data Analysis and Information Processing
Abstract:Traditional classification algorithms perform not very well on imbalanced data sets and small sample size. To deal with the problem, a novel method is proposed to change the class distribution through adding virtual samples, which are generated by the windowed regression over-sampling (WRO) method. The proposed method WRO not only reflects the additive effects but also reflects the multiplicative effect between samples. A comparative study between the proposed method and other over-sampling methods such as synthetic minority over-sampling technique (SMOTE) and borderline over-sampling (BOS) on UCI datasets and Fourier transform infrared spectroscopy (FTIR) data set is provided. Experimental results show that the WRO method can achieve better performance than other methods.