Abstract:We present a new method for making diffusion models faster to sample. The method distills many-step diffusion models into few-step models by matching conditional expectations of the clean data given noisy data along the sampling trajectory. Our approach extends recently proposed one-step methods to the multi-step case, and provides a new perspective by interpreting these approaches in terms of moment matching. By using up to 8 sampling steps, we obtain distilled models that outperform not only their one-step versions but also their original many-step teacher models, obtaining new state-of-the-art results on the Imagenet dataset. We also show promising results on a large text-to-image model where we achieve fast generation of high resolution images directly in image space, without needing autoencoders or upsamplers.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the sampling efficiency problem of diffusion models when generating high - dimensional data such as images, videos, and audio. Specifically, although diffusion models perform well in generating high - quality data, their sampling process usually requires hundreds of neural network evaluations, which makes these models very expensive in practical applications. To solve this problem, the author proposes a new method to distill a multi - step diffusion model into a few - step model by matching the conditional expectations of clean data at different noise levels. This method not only improves the sampling speed but also exceeds the performance of the original multi - step model in some cases. ### Main contributions of the paper 1. **Multi - step distillation method**: The author proposes a new multi - step distillation method to reduce the sampling steps by matching conditional expectations, thereby accelerating the generation process of diffusion models. 2. **Theoretical explanation**: This method explains the existing one - step distillation method from the perspective of moment matching and extends it to the multi - step case. 3. **Performance improvement**: Using at most 8 sampling steps, the distilled model not only outperforms its one - step version but also surpasses the original multi - step teacher model, achieving new best results on the ImageNet dataset. 4. **Text - to - image generation**: The author shows the application of this method in large - scale text - to - image models, which can quickly generate high - resolution images without using auto - encoders or up - samplers. ### Specific technical details - **Background introduction**: - Diffusion models generate high - dimensional data through a step - by - step denoising process. - The sampling process usually requires hundreds of neural network evaluations, resulting in high computational costs. - Existing distillation methods can be divided into two types: deterministic and distributional. - **Moment - matching distillation**: - Distill a multi - step diffusion model into a few - step model by matching the conditional expectations of clean data at different noise levels. - Use two variants: alternating optimization and parameter - space moment matching. - The alternating optimization method approximates the conditional expectation through an auxiliary denoising model. - The parameter - space moment matching method directly performs moment matching in the parameter space, avoiding an additional auxiliary model. - **Experimental results**: - Experiments on the ImageNet dataset show that the distilled model with 8 sampling steps outperforms the original multi - step model in terms of the FID metric. - In the text - to - image generation task, the distilled model can quickly generate high - quality images. ### Conclusion This paper proposes an effective multi - step distillation method, which significantly improves the sampling efficiency of diffusion models while maintaining or even enhancing the generation quality. This method has shown superior performance in both image generation and text - to - image generation tasks.

Multistep Distillation of Diffusion Models via Moment Matching

EM Distillation for One-step Diffusion Models

One-step Diffusion with Distribution Matching Distillation

Multi-student Diffusion Distillation for Better One-step Generators

One-Step Diffusion Distillation via Deep Equilibrium Models

SFDDM: Single-fold Distillation for Diffusion models

Simple and Fast Distillation of Diffusion Models

One-Step Diffusion Distillation through Score Implicit Matching

SCott: Accelerating Diffusion Models with Stochastic Consistency Distillation

Accelerating Diffusion Models with One-to-Many Knowledge Distillation

Non-uniform Timestep Sampling: Towards Faster Diffusion Model Training

Latent Dataset Distillation with Diffusion Models

Multistep Consistency Models

Relational Diffusion Distillation for Efficient Image Generation

Plug-and-Play Diffusion Distillation

Distillation of Discrete Diffusion through Dimensional Correlations

One Step Diffusion via Shortcut Models

Distilling Diffusion Models into Conditional GANs

Diffusion Models Are Innate One-Step Generators

Improved Distribution Matching Distillation for Fast Image Synthesis

Accelerating Parallel Sampling of Diffusion Models