Automated Imbalanced Learning

Prabhant Singh,Joaquin Vanschoren

DOI: https://doi.org/10.48550/arXiv.2211.00376

2022-11-01

Abstract:Automated Machine Learning has grown very successful in automating the time-consuming, iterative tasks of machine learning model development. However, current methods struggle when the data is imbalanced. Since many real-world datasets are naturally imbalanced, and improper handling of this issue can lead to quite useless models, this issue should be handled carefully. This paper first introduces a new benchmark to study how different AutoML methods are affected by label imbalance. Second, we propose strategies to better deal with imbalance and integrate them into an existing AutoML framework. Finally, we present a systematic study which evaluates the impact of these strategies and find that their inclusion in AutoML systems significantly increases their robustness against label imbalance.

Machine Learning,Artificial Intelligence

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the poor performance of AutoML (Automated Machine Learning) when dealing with imbalanced data sets. Specifically, the paper points out that many real - world data sets are naturally imbalanced, and if this imbalance is not properly addressed, it may lead to very poor performance of the model on the minority class. Therefore, the goals of the paper are: 1. **Propose new benchmarks**: In order to study the performance of different AutoML methods in the case of label imbalance, the paper first introduces four new benchmarks with different levels of class imbalance, so as to better analyze the behavior of AutoML methods when dealing with imbalanced data. 2. **Propose solutions**: Secondly, the paper proposes the AutoBalance framework, which is an open - source AutoML framework. It integrates balancing strategies into the existing AutoML process to better handle data imbalance problems. 3. **Systematic evaluation**: Finally, the paper evaluates the impact of these new strategies through a series of experiments. The results show that integrating these strategies into the AutoML system can significantly improve the system's robustness to label imbalance. Overall, the paper aims to improve the AutoML system so that it can handle imbalanced data sets more effectively, thereby enhancing the performance of the model in practical applications.

Automated Imbalanced Learning

Automated Machine Learning: From Principles to Practices

AutoBalance: Optimized Loss Functions for Imbalanced Data

A General Recipe for Automated Machine Learning in Practice

Benchmarking Automatic Machine Learning Frameworks

Benchmark and Survey of Automated Machine Learning Frameworks

Fix Fairness, Don't Ruin Accuracy: Performance Aware Fairness Repair using AutoML

Techniques for Automated Machine Learning

A Neophyte With AutoML: Evaluating the Promises of Automatic Machine Learning Tools

Value-Aware Resampling and Loss for Imbalanced Classification

Assessing the Use of AutoML for Data-Driven Software Engineering

Taking Human out of Learning Applications: A Survey on Automated Machine Learning.

AMLB: an AutoML Benchmark

An automated approach for binary classification on imbalanced data

Towards Deeper Insights into Deep Learning from Imbalanced Data.

Automated Machine Learning in Insurance

An Empirical Study on the Usage of Automated Machine Learning Tools

Leveraging Automated Machine Learning for Text Classification: Evaluation of AutoML Tools and Comparison with Human Performance

Whither AutoML? Understanding the Role of Automation in Machine Learning Workflows

Auto Machine Learning for Medical Image Analysis by Unifying the Search on Data Augmentation and Neural Architecture

AutoCompete: A Framework for Machine Learning Competition