A Normal Distribution-Based Over-Sampling Approach to Imbalanced Data Classification

Huaxiang Zhang,Zhichao Wang
DOI: https://doi.org/10.1007/978-3-642-25853-4_7
2011-01-01
Abstract:This study proposes a normal distribution-based over-sampling approach to balance the number of instances belonging to different classes in a data set. The balanced training data are used to learn unbiased classifiers for the original data set. Under some conditions, the proposed over-sampling approach generates samples with expected mean and variance similar to that of the original minority class data. As the approach tries to generate synthetic data with similar probability distributions to the original data, and expands the class boundaries of the minority class, it may increase the minority class classification performance. Experimental results show that the proposed approach outperforms alternative methods on benchmark data sets most of the times when implementing several classical classification algorithms.
What problem does this paper attempt to address?