A Clustering Resampling Stacked Ensemble Method for Imbalance Classification Problem

Jian Li,Jinlian Du,Xiao Zhang
DOI: https://doi.org/10.1109/hpcc-dss-smartcity-dependsys57074.2022.00124
2022-01-01
Abstract:The results of the existing research on ensemble methods based on resampling are the best for the imbalance classification task, whereas the results of independent use of resampling or ensemble learning are relatively mediocre. Compared with the other two ensemble strategies such as boosting and bagging, stacking is often better than iterative boosting in training speed because it is an ensemble strategy that parallels base classifiers. Bagging uses a weighted average at the decision level, whereas stacking uses a machine learning model to make decisions, which achieves higher accuracy than bagging using a weighted average decision. The effect of stacking relies heavily on the variance of the base classifiers, and it is often better to ensure that the base classifiers have good learning performance while making the variance among base classifiers as large as possible. The key to imbalanced classification is to deal with the class imbalance and overlapping in training sets. In this study, we will use a hierarchical approach to solve these two problems. Thus, we propose a resampling-based stacking ensemble method combined with clustering. The proposed method uses a support vector machine as the base classifier to solve the problem of losing important information and easily introducing noise in the resampling process through clustering. It also increases the differences among base classifiers. We use the proposed stacking ensemble method to hierarchically solve the class overlapping and then use the cost-sensitive algorithm to solve the class imbalance. The statistical significance analysis of the experimental results shows that the proposed method performs better than existing imbalanced ensemble methods.
What problem does this paper attempt to address?