Study of Constrained Network Structures for WGANs on Numeric Data Generation

Wei Wang,Chuang Wang,Tao Cui,Yue Li
DOI: https://doi.org/10.1109/ACCESS.2020.2993839
2019-11-05
Abstract:Some recent studies have suggested using GANs for numeric data generation such as to generate data for completing the imbalanced numeric data. Considering the significant difference between the dimensions of the numeric data and images, as well as the strong correlations between features of numeric data, the conventional GANs normally face an overfitting problem, consequently leads to an ill-conditioning problem in generating numeric and structured data. This paper studies the constrained network structures between generator G and discriminator D in WGAN, designs several structures including isomorphic, mirror and self-symmetric structures. We evaluates the performances of the constrained WGANs in data augmentations, taking the non-constrained GANs and WGANs as the baselines. Experiments prove the constrained structures have been improved in 17/20 groups of experiments. In twenty experiments on four UCI Machine Learning Repository datasets, Australian Credit Approval data, German Credit data, Pima Indians Diabetes data and SPECT heart data facing five conventional classifiers. Especially, Isomorphic WGAN is the best in 15/20 experiments. Finally, we theoretically proves that the effectiveness of constrained structures by the directed graphic model (DGM) analysis.
Machine Learning
What problem does this paper attempt to address?
The problems that this paper attempts to solve are: when generating numeric data, the over - fitting problem and ill - conditioning problem faced by traditional Generative Adversarial Networks (GANs). These problems lead to the low quality of the generated numeric data, especially performing poorly when dealing with unbalanced datasets. Specifically: 1. **Differences between numeric data and image data**: - Numeric data usually has a lower dimension. For example, the Pima Indians Diabetes dataset has 8 dimensions, and the SPECT Heart dataset has 22 dimensions. - The correlation between each dimension of numeric data is relatively strong, and the value of each dimension usually has a specific meaning (such as age, income, etc.), while the pixel values of image data usually do not have a clear practical meaning. 2. **Deficiencies of traditional GAN**: - Due to the above characteristics, traditional GANs are prone to over - fitting problems when generating numeric data, resulting in poor - quality generated data, which further affects the performance of classifiers. - Compared with traditional data generation methods (such as SMOTE), GAN - based methods do not show obvious advantages on some datasets. 3. **Proposed solutions**: - By introducing constrained network structures, the paper designs several WGANs (Wasserstein GAN) with specific structures, including isomorphic structure, mirror structure, and self - symmetric structure, to improve the quality of generated numeric data. - These structures aim to provide additional constraints for the learning processes of the generator G and the discriminator D, thereby improving the quality of generated data and the performance of classifiers. In summary, the main goal of this paper is to solve the over - fitting and ill - conditioning problems encountered when generating numeric data by improving the network structure of WGAN, thereby improving the quality of generated data and the performance of classifiers.