HGDO: an Oversampling Technique Based on Hypergraph Recognition and Gaussian Distribution

Liyan Jia,Zhiping Wang,Pengfei Sun,Peiwen Wang
DOI: https://doi.org/10.1016/j.ins.2024.120891
IF: 8.1
2024-01-01
Information Sciences
Abstract:The synthetic minority oversampling technique (SMOTE) is the most prevalent solution in class imbalance learning. While SMOTE and its variant methods handle imbalanced data well in most cases, they fail to take full advantage of the structural information in the overall data, which leads to the propagation of noise. Some existing SMOTE variants remove noisy samples by adding an undersampling process. However, due to the complexity of the data distribution, it is difficult to accurately identify real noise samples, leading to lower modeling quality. To this end, we propose an oversampling technique based on hypergraph identification and Gaussian distribution (HGDO). First, neighborhood reconstruction is performed for each sample depending on the sparse representation to build a hypergraph model, and outlier and noisy samples are filtered according to this model. Then, the weight of each retained minority class sample is determined through the distribution relationship of hyperedges and vertices. Finally, new samples are generated based on the Laplacian matrix and Gaussian distribution to balance the dataset. The comprehensive experimental analysis demonstrates the superiority of HGDO over some popular SMOTE variants.
What problem does this paper attempt to address?