Data Augmentation Generated by Generative Adversarial Network for Small Sample Datasets Clustering

Hui Yu,Qiao Feng Wang,Jian Yu Shi
DOI: https://doi.org/10.1007/s11063-023-11315-z
IF: 2.565
2023-06-10
Neural Processing Letters
Abstract:In the field of data mining, the performance of clustering is largely affected by the number of samples. However, obtaining enough data samples in some applications is difficult and expensive. To solve this problem, data augmentation like the oversampling methods have been adopted, but these methods mainly focus more on the local information of the data, without considering its potential distribution. In this paper, a new data augmentation method is proposed, which is the Wasserstein Generation Adversarial Network based on the Gaussian Mixture Model (GMM_WGAN) to generate datasets for small samples, to solve the problem of insufficient dataset size in clustering. It includes two steps, in the first step we use the Gaussian Mixture Model to capture the potential distribution of the real dataset, and in the second step, we use Wasserstein generative adversarial network to generate data samples to expand the small size dataset. We utilize five clustering algorithms to evaluate GMM_WGAN performance and compare it with the other seven data enhancement methods. Experiments on 10 small size datasets demonstrate that the proposed approach achieves greater result than others based on five evaluation metrics.
computer science, artificial intelligence
What problem does this paper attempt to address?