Learning Confidence Bounds for Classification with Imbalanced Data

Matt Clifford,Jonathan Erskine,Alexander Hepburn,Raúl Santos-Rodríguez,Dario Garcia-Garcia

2024-10-01

Abstract:Class imbalance poses a significant challenge in classification tasks, where traditional approaches often lead to biased models and unreliable predictions. Undersampling and oversampling techniques have been commonly employed to address this issue, yet they suffer from inherent limitations stemming from their simplistic approach such as loss of information and additional biases respectively. In this paper, we propose a novel framework that leverages learning theory and concentration inequalities to overcome the shortcomings of traditional solutions. We focus on understanding the uncertainty in a class-dependent manner, as captured by confidence bounds that we directly embed into the learning process. By incorporating class-dependent estimates, our method can effectively adapt to the varying degrees of imbalance across different classes, resulting in more robust and reliable classification outcomes. We empirically show how our framework provides a promising direction for handling imbalanced data in classification tasks, offering practitioners a valuable tool for building more accurate and trustworthy models.

Machine Learning

What problem does this paper attempt to address?

### The Problem the Paper Attempts to Solve The paper attempts to address the issue of class imbalance in classification tasks. In real-world datasets, the number of samples in different classes is often imbalanced. For example, in medical diagnosis datasets, the number of samples of healthy individuals is much greater than that of individuals with a rare disease. This imbalance can cause traditional classification algorithms to be biased towards the majority class, resulting in poor predictive performance on the minority class. Although there are existing methods such as undersampling, oversampling, and cost-sensitive learning to tackle this problem, these methods have their limitations, such as information loss and the introduction of new biases. The paper proposes a new framework that leverages learning theory and concentration inequalities to overcome the shortcomings of traditional methods. This approach embeds class-dependent confidence intervals directly into the learning process to understand and handle the uncertainty of the minority class. This can effectively adapt to different degrees of class imbalance, thereby improving the robustness and reliability of classification results. Specifically, the method in the paper adjusts the bias term of the pre-trained classifier to reflect the uncertainty caused by the smaller number of minority class samples. This approach is not only more rigorous theoretically but also performs well in practical applications, especially when the pre-trained classifier has already learned a good representation of the data.

Learning Confidence Bounds for Classification with Imbalanced Data

A Normal Distribution-Based Over-Sampling Approach to Imbalanced Data Classification

A New Sampling Approach for Classification of Imbalanced Data Sets with High Density.

Learning with Imbalanced Noisy Data by Preventing Bias in Sample Selection

Class Uncertainty: A Measure to Mitigate Class Imbalance

Adaptive Sampling With Optimal Cost For Class-Imbalance Learning

CCL: Class-Wise Curriculum Learning for Class Imbalance Problems

A Bilevel Optimization Framework for Imbalanced Data Classification

Restoring balance: principled under/oversampling of data for optimal classification

A cluster impurity-based hybrid resampling for imbalanced classification problems

A New Approach for Imbalanced Data Classification Based on Minimize Loss Learning

Rethinking Class Imbalance in Machine Learning

Conformal-in-the-Loop for Learning with Imbalanced Noisy Data

Noise-robust Oversampling for Imbalanced Data Classification

A Theoretical Analysis of the Learning Dynamics under Class Imbalance

Uncertainty-Aware Learning against Label Noise on Imbalanced Datasets

Handling Inter-class and Intra-class Imbalance in Class-imbalanced Learning

Improved Randomized Learning Algorithms for Imbalanced and Noisy Educational Data Classification

Introducing DeepBalance: Random Deep Belief Network Ensembles to Address Class Imbalance