DuDGAN: Improving Class-Conditional GANs via Dual-Diffusion

Taesun Yeom,Minhyeok Lee
2023-06-06
Abstract:Class-conditional image generation using generative adversarial networks (GANs) has been investigated through various techniques; however, it continues to face challenges such as mode collapse, training instability, and low-quality output in cases of datasets with high intra-class variation. Furthermore, most GANs often converge in larger iterations, resulting in poor iteration efficacy in training procedures. While Diffusion-GAN has shown potential in generating realistic samples, it has a critical limitation in generating class-conditional samples. To overcome these limitations, we propose a novel approach for class-conditional image generation using GANs called DuDGAN, which incorporates a dual diffusion-based noise injection process. Our method consists of three unique networks: a discriminator, a generator, and a classifier. During the training process, Gaussian-mixture noises are injected into the two noise-aware networks, the discriminator and the classifier, in distinct ways. This noisy data helps to prevent overfitting by gradually introducing more challenging tasks, leading to improved model performance. As a result, our method outperforms state-of-the-art conditional GAN models for image generation in terms of performance. We evaluated our method using the AFHQ, Food-101, and CIFAR-10 datasets and observed superior results across metrics such as FID, KID, Precision, and Recall score compared with comparison models, highlighting the effectiveness of our approach.
Computer Vision and Pattern Recognition,Image and Video Processing
What problem does this paper attempt to address?
The problems that this paper attempts to solve are common challenges in Conditional Generative Adversarial Networks (cGANs), such as mode collapse, training instability and low - quality output, especially when dealing with datasets with high intra - class variation. Moreover, most cGANs require a large number of iterations to converge, which leads to low iteration efficiency during the training process. Although Diffusion Models have shown potential in generating real samples, they have key limitations when generating conditional samples. For this reason, the authors propose a new method - DuDGAN, which improves the quality and stability of conditional image generation by introducing a Dual - Diffusion noise injection process. Specifically, the DuDGAN method consists of three unique networks: the Discriminator, the Generator and the Classifier. During the training process, Gaussian mixture noise is injected in different ways into two noise - aware networks - the Discriminator and the Classifier respectively. This noisy data helps prevent over - fitting and improves model performance by gradually introducing more challenging tasks. The results show that this method outperforms the existing state - of - the - art conditional GAN models in image generation tasks, especially on the AFHQ, Food - 101 and CIFAR - 10 datasets, where evaluation metrics such as FID, KID, Precision and Recall scores are all better than those of the comparison models. In this way, DuDGAN not only improves the quality and diversity of generated images, but also achieves rapid convergence and improves the iteration efficiency of training, thus effectively solving several key problems in conditional GAN training.