Imbalanced Data Classification Algorithm Based on Integrated Sampling and Ensemble Learning.

Yan Han,Mingxiang He,Qixian Lu
DOI: https://doi.org/10.1007/978-981-13-5841-8_64
2018-01-01
Abstract:In order to alleviate the impact of imbalanced data on support vector machine (SVM), an integrated hybrid sampling imbalanced data classification method is proposed. First, the imbalance rate of imbalanced data is reduced by the ADASYN-NCL (Adaptive Synthetic Sampling Technique—Domain Cleanup Rule Downsampling Method) hybrid sampling method. Then, the AdaBoost algorithm framework is used to give different weight adjustments to the misclassification of minority and majority classes, and selectively integrate several classifiers to obtain better classification. Finally, use the 10 sets of imbalanced data in the KEEL database as test objects, and F-value and G-mean are used as evaluation indicators to verify the performance of the classification algorithm. The experimental results show that the classification algorithm has certain advantages for the classification effect of imbalanced data sets.
What problem does this paper attempt to address?