Entropy‐based hybrid sampling ensemble learning for imbalanced data

Li Dongdong,Chi Ziqiu,Wang Bolu,Wang Zhe,Yang Hai,Du Wenli
DOI: https://doi.org/10.1002/int.22388
IF: 8.993
2021-03-05
International Journal of Intelligent Systems
Abstract:<p>Sampling method is one of the most commonly used techniques in dealing with imbalanced data. Most of the existing undersampling methods randomly select samples from negative class with replacement. However, it may lose some important information of the training data. Moreover, increasing the positive data by oversampling in high imbalanced situations may cause the overlapping problem. To overcome these problems, this paper proposes a hybrid sampling method. The method takes the distributions of the training data into consideration by the information entropy, thus distinguishing the important samples in the undersampling procedure. Meanwhile, since the positive data only extend to the size of each subset of the negative class in the oversampling, the overlapping problem is relieved. Further, the method retains all the data in the training procedure and generates various data views from the original training data. Then each view is handled with an individual basic classifier. Finally, all the basic classifiers are combined by the ensemble method. The newly proposed method is named as Entropy‐based Hybrid Sampling Ensemble Learning (EHSEL). In addition, the EHSEL is applied to three different kinds of basic classifiers to validate its robustness. Experiments results show the great effectiveness of the EHSEL on real‐world imbalanced data sets.</p>
computer science, artificial intelligence
What problem does this paper attempt to address?