A weighted hybrid ensemble method for classifying imbalanced data

Jiakun Zhao,Ju Jin,Si Chen,Ruifeng Zhang,Bilin Yu,Qingfang Liu
DOI: https://doi.org/10.1016/j.knosys.2020.106087
2020-09-01
Abstract:<p>In real datasets, most are unbalanced. Data imbalance can be defined as the number of instances in some classes greatly exceeds the number of instances in other classes. Whether in the field of data mining or machine learning, data imbalance can have adverse effects. At present, the methods to solve the problem of data imbalance can be divided into data-level methods, algorithm-level methods and hybrid methods. In this paper, we propose a weighted hybrid ensemble method for classifying imbalanced data in binary classification tasks, called WHMBoost. In the framework of the boosting algorithm, the presented method combines two data sampling methods and two base classifiers, and each sampling method and each base classifier is assigned corresponding weights, which makes them have better complementary advantages. The performance of WHMBoost has been evaluated on 40 benchmark imbalanced datasets with state of the art ensemble methods like AdaBoost, RUSBoost, SMOTEBoost using AUC, F-Measure and Geometric Mean as the performance evaluation criteria. Experimental results show significant improvement over the other methods and it can be concluded that WHMBoost is a promising and effective algorithm to deal with imbalance datasets.</p>
computer science, artificial intelligence
What problem does this paper attempt to address?
The paper aims to address the issue of data imbalance in binary classification tasks. Specifically, the paper proposes a Weighted Hybrid Ensemble Method (WHMBoost), which combines two data sampling techniques and two base classifiers, assigning corresponding weights to each sampling method and base classifier to improve classification performance. The main contributions include: 1. Improved the random balancing algorithm by proposing an adjustable random balancing algorithm and validated its feasibility through experiments. 2. Proposed a weighted hybrid ensemble method that combines two sampling methods and two base classifiers. 3. Experimental results show that the proposed ensemble method outperforms other existing methods on multiple imbalanced datasets. 4. Demonstrated that the proposed ensemble model performs better than single models in certain cases.