An ensemble oversampling method for imbalanced classification with prior knowledge via generative adversarial network
Yulin Zhang,Yuchen Liu,Yan Wang,Jie Yang
DOI: https://doi.org/10.1016/j.chemolab.2023.104775
IF: 4.175
2023-02-01
Chemometrics and Intelligent Laboratory Systems
Abstract:Currently, an increasing number of real-world applications show characteristics of class-imbalance classification suffering from severe class distribution skewing, thus requiring brand new algorithms to learn from imbalanced datasets. In this paper, a novel oversampling method using GAN framework is proposed for numerical imbalanced data, namely G-GAN. In the method, a Gaussian distribution of minority samples is estimated to get prior knowledge of minority class for the latent space of GAN. In order to increase the randomness of the generated samples, noises are obtained by a mixed strategy, that is, some noises of generator obey Gaussian distribution and others obey random distribution. Then G-GAN is trained to generate dispersive positive samples with the idea of Bagging, which could avoid the occurrence of overfitting. G-GAN is different from other literatures in that GAN does not directly generate minority samples, but adds the distribution information of minority samples to the latent space of GAN, and then generates minority samples. Compared with 11 commonly used oversampling methods, G-GAN obtains promising results in terms of G-mean, AUC, F-measure and ROC utilizing three classifiers on 11 benchmark imbalanced datasets. Furthermore, G-GAN is also validated on AUC metrics of a real Diabetes imbalanced dataset. The results demonstrate that G-GAN can provide great potential for imbalanced classification in the two numerical experiments.
automation & control systems,computer science, artificial intelligence,instruments & instrumentation,statistics & probability,mathematics, interdisciplinary applications,chemistry, analytical