Cluster-based Generative Adversarial Network Imbalanced Data Generation Method

Min Fan,Qing Yang,Bo Zhang,Meng zhou,Ke Zhang,Jialu Xia
DOI: https://doi.org/10.1109/ddcls52934.2021.9455671
2021-01-01
Abstract:Data imbalance is widespread in data mining. However, the imbalance of data makes the minority recognition rate low but the cost of misclassification is high. In order to improve the minority recognition rate, this paper proposed a data generation method for balance data set. First, the minority data is clustered and divided into sub-categories, next it built a GAN model to generate each sub-category data separately to improve the quality of data generation and balance the number of majority and minority categories, and also alleviate the imbalance within the minority category. Then, this paper built a cost-sensitive convolutional neural network model for classification. Finally, experiments on the UCI public data set show that the method has a good classification performance.
What problem does this paper attempt to address?