Fast Sampling of Diffusion Models via Operator Learning

Hongkai Zheng,Weili Nie,Arash Vahdat,Kamyar Azizzadenesheli,Anima Anandkumar
2023-07-22
Abstract:Diffusion models have found widespread adoption in various areas. However, their sampling process is slow because it requires hundreds to thousands of network evaluations to emulate a continuous process defined by differential equations. In this work, we use neural operators, an efficient method to solve the probability flow differential equations, to accelerate the sampling process of diffusion models. Compared to other fast sampling methods that have a sequential nature, we are the first to propose a parallel decoding method that generates images with only one model forward pass. We propose diffusion model sampling with neural operator (DSNO) that maps the initial condition, i.e., Gaussian distribution, to the continuous-time solution trajectory of the reverse diffusion process. To model the temporal correlations along the trajectory, we introduce temporal convolution layers that are parameterized in the Fourier space into the given diffusion model backbone. We show our method achieves state-of-the-art FID of 3.78 for CIFAR-10 and 7.83 for ImageNet-64 in the one-model-evaluation setting.
Machine Learning,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address the issue of slow sample generation in diffusion models. Specifically, existing diffusion models require hundreds to thousands of network evaluations to simulate the continuous process defined by differential equations, making them much slower than other generative models such as Generative Adversarial Networks (GANs). This paper proposes a neural operator-based method—DSNO (Diffusion Sampling with Neural Operator)—to accelerate the sampling process of diffusion models. Compared to existing fast sampling methods, DSNO introduces a parallel decoding method for the first time, capable of generating images in a single model forward pass. The main contributions include: 1. Proposing the DSNO model for fast sampling, which can generate high-quality images with only one model evaluation. 2. Introducing a time-domain convolution block parameterized in the Fourier space, which can be easily integrated with existing diffusion model architectures to construct the DSNO backbone network, adding only a small number of model parameters (about 10%). 3. Proposing for the first time a parallel decoding method that uses continuous function representation to generate image trajectories, achieving a single-step final solution. 4. Achieving new state-of-the-art FID scores on the CIFAR-10 and ImageNet-64 datasets, with scores of 3.78 and 7.83, respectively. In summary, this paper addresses the complex differential equation solving problem in the sampling process of diffusion models by introducing neural operators, significantly improving sampling efficiency.