Novel resampling algorithms with maximal cliques for class-imbalance problems

Long-hui Wang,Qi Dai,Tony Du,Li-fang Chen
DOI: https://doi.org/10.1016/j.cie.2024.110754
IF: 7.18
2024-12-06
Computers & Industrial Engineering
Abstract:The imbalance issue significantly deteriorates the performance of classifiers. While researchers proposed resampling methods to address this problem, it often struggles with class overlap and small disjuncts, and focuses only on one-way relationships among instances. Most of these techniques emphasize neighbor relationships between pairs of instances while ignoring the global relationships within the dataset. To address this, we represented instances as nodes in a graph, with edges based on neighbor relationships. We constructed a neighbor graph of the dataset and identify its maximal clique. Using this concept, we proposed two resampling techniques: the Maximal Clique-based Oversampling (MCSO) and Undersampling (MCSU) methods. MCSO employs the maximal clique method to tackle the small disjunct problem, while MCSU addresses overlapping issues from a global perspective. We tested these methods using three base classifiers-CART, RF, and GBDT-on 35 public datasets. Our results showed that MCSO and MCSU outperform state-of-the-art methods in terms of AUC and G-mean. Additionally, the Friedman test and Nemenyi post-hoc test showed that our MCSO and MCSU are significantly better than other algorithms, demonstrating their superior performance.
computer science, interdisciplinary applications,engineering, industrial
What problem does this paper attempt to address?