Ensemble Strategy for Hard Classifying Samples in Class-Imbalanced Data Set

Yingze Yang,Pengcheng Xiao,Yijun Cheng,Weirong Liu,Zhiwu Huang
DOI: https://doi.org/10.1109/BigComp.2018.00033
2018-01-01
Abstract:Imbalanced data is ubiquitous and brings much difficulty for data classification. In this paper, we propose an ensemble strategy to address binary classification imbalanced problem by treatment of difficult samples or hard classifying samples. The ensemble strategy is within a two layers framework, which combines the advantages of resampling method and classifier. In the first layer, a novel resampling strategy is proposed to increase the weight of hard classifying samples in the dataset. Then, an efficient ensemble strategy is utilized in the second layer, which applies extreme gradient boosting classifier combined with the proposed resampling strategy to train the classifier model. The experiment uses 44 datasets from KEEL dataset repository and the results reveal that the proposed ensemble strategy outperforms four other state-of-the-art ensemble strategies. For the imbalanced big data, three metrics of the proposed ensemble strategy are better than various ensemble strategies.
What problem does this paper attempt to address?