Automated Imbalanced Learning

Prabhant Singh,Joaquin Vanschoren
DOI: https://doi.org/10.48550/arXiv.2211.00376
2022-11-01
Abstract:Automated Machine Learning has grown very successful in automating the time-consuming, iterative tasks of machine learning model development. However, current methods struggle when the data is imbalanced. Since many real-world datasets are naturally imbalanced, and improper handling of this issue can lead to quite useless models, this issue should be handled carefully. This paper first introduces a new benchmark to study how different AutoML methods are affected by label imbalance. Second, we propose strategies to better deal with imbalance and integrate them into an existing AutoML framework. Finally, we present a systematic study which evaluates the impact of these strategies and find that their inclusion in AutoML systems significantly increases their robustness against label imbalance.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the poor performance of AutoML (Automated Machine Learning) when dealing with imbalanced data sets. Specifically, the paper points out that many real - world data sets are naturally imbalanced, and if this imbalance is not properly addressed, it may lead to very poor performance of the model on the minority class. Therefore, the goals of the paper are: 1. **Propose new benchmarks**: In order to study the performance of different AutoML methods in the case of label imbalance, the paper first introduces four new benchmarks with different levels of class imbalance, so as to better analyze the behavior of AutoML methods when dealing with imbalanced data. 2. **Propose solutions**: Secondly, the paper proposes the AutoBalance framework, which is an open - source AutoML framework. It integrates balancing strategies into the existing AutoML process to better handle data imbalance problems. 3. **Systematic evaluation**: Finally, the paper evaluates the impact of these new strategies through a series of experiments. The results show that integrating these strategies into the AutoML system can significantly improve the system's robustness to label imbalance. Overall, the paper aims to improve the AutoML system so that it can handle imbalanced data sets more effectively, thereby enhancing the performance of the model in practical applications.