Abstract:Diffusion models are emerging as powerful solutions for generating high-fidelity and diverse images, often surpassing GANs under many circumstances. However, their slow inference speed hinders their potential for real-time applications. To address this, DiffusionGAN leveraged a conditional GAN to drastically reduce the denoising steps and speed up inference. Its advancement, Wavelet Diffusion, further accelerated the process by converting data into wavelet space, thus enhancing efficiency. Nonetheless, these models still fall short of GANs in terms of speed and image quality. To bridge these gaps, this paper introduces the Latent Denoising Diffusion GAN, which employs pre-trained autoencoders to compress images into a compact latent space, significantly improving inference speed and image quality. Furthermore, we propose a Weighted Learning strategy to enhance diversity and image quality. Experimental results on the CIFAR-10, CelebA-HQ, and LSUN-Church datasets prove that our model achieves state-of-the-art running speed among diffusion models. Compared to its predecessors, DiffusionGAN and Wavelet Diffusion, our model shows remarkable improvements in all evaluation metrics. Code and pre-trained checkpoints: \url{<a class="link-external link-https" href="https://github.com/thanhluantrinh/LDDGAN.git" rel="external noopener nofollow">this https URL</a>}

What problem does this paper attempt to address?

The main aim of this paper is to address the following issues: ### Paper Objectives - **Improve Sampling Speed**: Address the slow inference speed of diffusion models in image generation tasks to meet the needs of real-time applications. - **Enhance Image Quality**: Improve the quality of images generated by diffusion models to match or surpass the current state-of-the-art models (such as StyleGAN). ### Method Overview To achieve the above objectives, the authors propose the "Latent Denoising Diffusion GAN" (LDDGAN). The core of this method lies in: 1. **Using a Pre-trained Autoencoder to Compress Input Images**: Compressing input images into a low-dimensional latent space to reduce computational costs and accelerate the training and inference process. 2. **Employing Conditional Generative Adversarial Networks (GAN) for Denoising**: Using conditional GANs to model complex and multimodal distributions, allowing for larger denoising steps and thus reducing the number of required denoising steps. 3. **Proposing a Weighted Learning Strategy**: Combining the advantages of adversarial loss and reconstruction loss to enhance image diversity while ensuring image quality. ### Main Contributions - Proposed a novel latent denoising diffusion GAN framework that leverages the compatibility of dimensionality reduction and low-dimensional latent space with the denoising process of diffusion models, thereby improving inference speed, image quality, and diversity. - Discovered that if the denoising process of diffusion models does not rely on Gaussian distribution, it is necessary to eliminate the autoencoder's dependency on learning Gaussian distribution to enhance the diversity and quality of generated images. - Proposed a new strategy called weighted learning, which enhances diversity through adversarial loss while improving image quality using reconstruction loss. - Achieved lower training costs and state-of-the-art inference speed, paving the way for real-time, high-fidelity diffusion models. ### Experimental Results - Experimental results on standard benchmark datasets (such as CIFAR-10, CelebA-HQ, and LSUN Church) show that the model achieves the fastest running speed among diffusion models while maintaining high-quality image generation. - Comparable to GAN models in terms of image generation quality, with higher diversity. - Significant improvements over previous models DiffusionGAN and Wavelet Diffusion in most evaluation metrics.

Latent Denoising Diffusion GAN: Faster sampling, Higher image quality

Tackling the Generative Learning Trilemma with Denoising Diffusion GANs

Distilling Diffusion Models into Conditional GANs

Directly Denoising Diffusion Models

High-Resolution Image Synthesis with Latent Diffusion Models

Accelerating Diffusion Models via Early Stop of the Diffusion Process

Learning to Discretize Denoising Diffusion ODEs

Generation diffusion degradation: Simple and efficient design for blind super-resolution

Diffusion-GAN: Training GANs with Diffusion

SinDiffusion: Learning a Diffusion Model from a Single Natural Image

Efficient Transfer Learning in Diffusion Models via Adversarial Noise

Diffusion Models Beat GANs on Image Synthesis

Latent Diffusion Model for Medical Image Standardization and Enhancement

Faster Diffusion: Rethinking the Role of the Encoder for Diffusion Model Inference

PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher

Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation

Accelerating Video Diffusion Models via Distribution Matching

Contour wavelet diffusion: A fast and high‐quality image generation model

CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model

Adv-KD: Adversarial Knowledge Distillation for Faster Diffusion Sampling

Dist-GAN: An Improved GAN using Distance Constraints