Abstract:A wide variety of deep generative models has been developed in the past decade. Yet, these models often struggle with simultaneously addressing three key requirements including: high sample quality, mode coverage, and fast sampling. We call the challenge imposed by these requirements the generative learning trilemma, as the existing models often trade some of them for others. Particularly, denoising diffusion models have shown impressive sample quality and diversity, but their expensive sampling does not yet allow them to be applied in many real-world applications. In this paper, we argue that slow sampling in these models is fundamentally attributed to the Gaussian assumption in the denoising step which is justified only for small step sizes. To enable denoising with large steps, and hence, to reduce the total number of denoising steps, we propose to model the denoising distribution using a complex multimodal distribution. We introduce denoising diffusion generative adversarial networks (denoising diffusion GANs) that model each denoising step using a multimodal conditional GAN. Through extensive evaluations, we show that denoising diffusion GANs obtain sample quality and diversity competitive with original diffusion models while being 2000$\times$ faster on the CIFAR-10 dataset. Compared to traditional GANs, our model exhibits better mode coverage and sample diversity. To the best of our knowledge, denoising diffusion GAN is the first model that reduces sampling cost in diffusion models to an extent that allows them to be applied to real-world applications inexpensively. Project page and code can be found at https://nvlabs.github.io/denoising-diffusion-gan

Denoising Speech Signals with Hifi-Coulomb-GANs

DENOISPEECH: DENOISING TEXT TO SPEECH WITH FRAME-LEVEL NOISE MODELING

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

SpecDiff-GAN: A Spectrally-Shaped Noise Diffusion GAN for Speech and Music Synthesis

AeGAN: Time-Frequency Speech Denoising via Generative Adversarial Networks

Study of GANs for Noisy Speech Simulation from Clean Speech

Sdgan: Improve Speech Enhancement Quality by Information Filter

A Multi-Scale Generative Adversarial Network for Real-World Image Denoising

Deep Generative Adversarial Networks for the Sparse Signal Denoising

Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks

Tackling the Generative Learning Trilemma with Denoising Diffusion GANs

Generative Adversarial Networks with Denoising Penalty and Sample Augmentation

Conditional Generative Adversarial Networks for Speech Enhancement and Noise-Robust Speaker Verification

DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

Multi-Metric Optimization using Generative Adversarial Networks for Near-End Speech Intelligibility Enhancement

Learning Generative Models of Structured Signals from Their Superposition Using GANs with Application to Denoising and Demixing

CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement

HiFiDenoise: High-Fidelity Denoising Text to Speech with Adversarial Networks.

Speaking-Rate-Controllable HiFi-GAN Using Feature Interpolation

GAN-Based Speech Enhancement for Low SNR Using Latent Feature Conditioning

Robust Real-time Audio-Visual Speech Enhancement based on DNN and GAN