Conditional Wasserstein generative adversarial network-gradient penalty-based approach to alleviating imbalanced data classification

Ming Zheng,Tong Li,Rui Zhu,Yahui Tang,Mingjing Tang,Leilei Lin,Zifei Ma
DOI: https://doi.org/10.1016/j.ins.2019.10.014
IF: 8.1
2020-01-01
Information Sciences
Abstract:In data mining, common classification algorithms cannot effectively learn from imbalanced data. Oversampling addresses this problem by creating data for the minority class in order to balance the class distribution before the model is trained. The Traditional oversampling approaches are based on Synthetic Minority Oversampling TEchnique (SMOTE), which focus on local information but generates insufficiently realistic data. In contrast, the Generative Adversarial Network (GAN) captures the true data distribution in order to generate data for the minority class. However, both approaches are problematic owing to mode collapse and unstable training. To overcome these problems, we propose Conditional Wasserstein GAN- Gradient Penalty (CWGAN-GP), a novel and efficient synthetic oversampling approach for imbalanced datasets, which can be constructed by adding auxiliary conditional information to the WGAN-GP. CWGAN-GP generates more realistic data and overcomes the aforementioned problems. Experiments on 15 different benchmarked datasets and two real imbalanced datasets empirically demonstrate that CWGAN-GP increases the quality of synthetic data; furthermore, our approach outperforms the other oversampling approaches based on three evaluation metrics (F-measure, G-mean, and the area under the receiver operating characteristic curve) for five classifiers. Friedman and Nemenyi post hoc statistical tests also confirm that CWGAN-GP is superior to the other oversampling approaches.
What problem does this paper attempt to address?