ICPRAI 2018 SI: On dynamic ensemble selection and data preprocessing for multi-class imbalance learning

Rafael M. O. Cruz,Mariana A. Souza,Robert Sabourin,George D. C. Cavalcanti
DOI: https://doi.org/10.48550/arXiv.1811.10481
2018-11-29
Abstract:Class-imbalance refers to classification problems in which many more instances are available for certain classes than for others. Such imbalanced datasets require special attention because traditional classifiers generally favor the majority class which has a large number of instances. Ensemble of classifiers have been reported to yield promising results. However, the majority of ensemble methods applied to imbalanced learning are static ones. Moreover, they only deal with binary imbalanced problems. Hence, this paper presents an empirical analysis of dynamic selection techniques and data preprocessing methods for dealing with multi-class imbalanced problems. We considered five variations of preprocessing methods and fourteen dynamic selection schemes. Our experiments conducted on 26 multi-class imbalanced problems show that the dynamic ensemble improves the AUC and the G-mean as compared to the static ensemble. Moreover, data preprocessing plays an important role in such cases.
Machine Learning
What problem does this paper attempt to address?
This paper attempts to solve the classification problem in multi - class imbalanced learning. Specifically, the article mainly focuses on the following points: 1. **Challenges of multi - class imbalanced datasets**: In multi - class imbalanced datasets, the number of samples in some classes is much larger than that in other classes. This causes traditional classifiers to tend to give priority to the majority classes, resulting in poor prediction performance for the minority classes. This imbalance is widespread in the real world, such as in fields like fraud detection, telephone call monitoring, biomedical diagnosis, and image retrieval. 2. **Application of dynamic selection techniques**: Although ensemble learning methods have shown certain potential in dealing with imbalanced data, most existing methods are static and only applicable to binary classification problems. Therefore, this paper proposes a method based on Dynamic Selection (DS) techniques to deal with multi - class imbalanced problems. Dynamic selection techniques can select the most appropriate classifier or set of classifiers for each new query sample to improve classification performance. 3. **Importance of data pre - processing**: To further enhance the effect of dynamic selection techniques, the author also explores different data pre - processing methods. These methods aim to balance the class distribution in the original dataset, thereby reducing the impact of imbalance on the learning process. Experimental results show that data pre - processing plays a crucial role in dynamic selection techniques. 4. **Research questions**: - What role does data pre - processing play in the performance of dynamic selection techniques? - Which data pre - processing technique is most suitable for dynamic and static ensemble combinations? - Does dynamic ensemble perform better than static ensemble? By comparing five data pre - processing techniques and fourteen dynamic selection schemes and conducting experiments on 26 multi - class imbalanced datasets, the author verifies that dynamic ensemble and appropriate data pre - processing can significantly improve evaluation metrics such as AUC and G - mean, thus better dealing with multi - class imbalanced problems.