SW: A Weighted Space Division Framework for Imbalanced Problems with Label Noise

Min Li,Hao Zhou,Qun Liu,Guoyin Wang
DOI: https://doi.org/10.1016/j.knosys.2022.109233
IF: 8.139
2022-01-01
Knowledge-Based Systems
Abstract:Imbalanced data learning is a ubiquitous challenge in data mining and machine learning. In particular, the ubiquity and inevitability of noise can exacerbate severe performance degradation. The synthetic minority oversampling technique (SMOTE) and its variants have been proposed. The core ideas of these variants are emphasizing the specific area or combining it with different noise filters; they introduce additional parameters that are difficult to optimize or rely on specific noise filters. Furthermore, SMOTE-based methods randomly select the nearest neighbor samples and perform random interpolation to synthesize new samples without considering the impact of the sample space’s chaotic degree. In this study, a framework called SW is proposed, which performs weighted sampling by calculating the sample space’s chaos. It is a general, robust and adaptive framework that copes with noisy imbalanced datasets and combines various oversampling algorithms to improve their performances. In the SW framework, the complete random forest (CRF) is introduced to divide the sample space and adaptively assign weights to distinguish and filter noisy and outlier samples. When synthesizing a new sample, the SW framework selects the seed samples’ neighbors and calculates the informed position using the derived weights, bringing the new sample closer to the safe area. Experimental results on 16 benchmark datasets and eight classic classifiers with eight pairs of representative oversampling algorithms demonstrate the SW framework’s effectiveness. The SW framework improves significantly in high-noise situations. In particular, SW-kmeans-SMOTE improved by approximately 5 % on average across all the metrics. Code and framework are available at https://github.com/dream-lm/SW_framework.
What problem does this paper attempt to address?