Class overlap handling methods in imbalanced domain: A comprehensive survey
Anil Kumar,Dinesh Singh,Rama Shankar Yadav
DOI: https://doi.org/10.1007/s11042-023-17864-8
IF: 2.577
2024-01-13
Multimedia Tools and Applications
Abstract:Class overlap in imbalanced datasets is the most common challenging situation for researchers in the fields of deep learning (DL) machine learning (ML), and big data (BD) based applications. Class overlap and imbalance data intrinsic characteristics negatively affect the performance of classification models. The data level, algorithm level, ensemble, and hybrid methods are the most commonly used solutions to reduce the biasing of the standard classification model towards the majority class. The data level methods change the distribution of class instances thus, increasing the information loss and overfitting. The algorithm-level methods attempt to modify its structure which gives more weight to the misclassified minority class instances in the learning phases. However, the changes in the algorithm are less compatible for the users. To overcome the issues in these methods, an in-depth discussion on the state-of-the-art methods is required and thus, presented here. In this survey, we presented a detailed discussion of the existing methods to handle class overlap in imbalanced datasets with their advantages, disadvantages, limitations, and key performance metrics in which the method shown outperformed. The detailed comparative analysis mainly of recent years' papers discussed and summarized the research gaps and future directions for the researchers in ML, DL, and BD-based applications.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering