Abstract:Current research on imbalanced data recognises that class imbalance is aggravated by other data intrinsic characteristics, among which class overlap stands out as one of the most harmful. The combination of these two problems creates a new and difficult scenario for classification tasks and has been discussed in several research works over the past two decades. In this paper, we argue that despite some insightful information can be derived from related research, the joint-effect of class overlap and imbalance is still not fully understood, and advocate for the need to move towards a unified view of the class overlap problem in imbalanced domains. To that end, we start by performing a thorough analysis of existing literature on the joint-effect of class imbalance and overlap, elaborating on important details left undiscussed on the original papers, namely the impact of data domains with different characteristics and the behaviour of classifiers with distinct learning biases. This leads to the hypothesis that class overlap comprises multiple representations, which are important to accurately measure and analyse in order to provide a full characterisation of the problem. Accordingly, we devise two novel taxonomies, one for class overlap measures and the other for class overlap-based approaches, both resonating with the distinct representations of class overlap identified. This paper therefore presents a global and unique view on the joint-effect of class imbalance and overlap, from precursor work to recent developments in the field. It meticulously discusses some concepts taken as implicit in previous research, explores new perspectives in light of the limitations found, and presents new ideas that will hopefully inspire researchers to move towards a unified view on the problem and the development of suitable strategies for imbalanced and overlapped domains.

Addressing Class Overlap under Imbalanced Distribution: An Improved Method and Two Metrics

A Novel Svm Modeling Approach For Highly Imbalanced And Overlapping Classification

Imbalanced Data Sets Classification Method Based on Over-Sampling Technique

On the joint-effect of class imbalance and overlap: a critical review

Class overlap handling methods in imbalanced domain: A comprehensive survey

A Comprehensive Investigation of the Impact of Class Overlap on Software Defect Prediction

Novel resampling algorithms with maximal cliques for class-imbalance problems

A Closer Look at AUROC and AUPRC under Class Imbalance

Hybrid SVM algorithm oriented to classifying imbalanced datasets

A quantum-based oversampling method for classification of highly imbalanced and overlapped data

Empirical analysis of performance assessment for imbalanced classification

Handling Inter-class and Intra-class Imbalance in Class-imbalanced Learning

Detecting Overlapping Areas in Unbalanced High-Dimensional Data Using Neighborhood Rough Set and Genetic Programming

Empirical Evaluation of the Impact of Class Overlap on Software Defect Prediction

Measuring Class-Imbalance Sensitivity of Deterministic Performance Evaluation Metrics

Handling Class Imbalance and Overlap with a Hesitation-based Instance Selection Method

Nearest neighbors and density-based undersampling for imbalanced data classification with class overlap

An effective method using clustering-based adaptive decomposition and editing-based diversified oversamping for multi-class imbalanced datasets

A cluster impurity-based hybrid resampling for imbalanced classification problems

Rethinking Class Imbalance in Machine Learning

Iterative Metric Learning for Imbalance Data Classification