Multistep Consistency Models

Jonathan Heek,Emiel Hoogeboom,Tim Salimans

2024-06-03

Abstract:Diffusion models are relatively easy to train but require many steps to generate samples. Consistency models are far more difficult to train, but generate samples in a single step. In this paper we propose Multistep Consistency Models: A unification between Consistency Models (Song et al., 2023) and TRACT (Berthelot et al., 2023) that can interpolate between a consistency model and a diffusion model: a trade-off between sampling speed and sampling quality. Specifically, a 1-step consistency model is a conventional consistency model whereas a $\infty$-step consistency model is a diffusion model. Multistep Consistency Models work really well in practice. By increasing the sample budget from a single step to 2-8 steps, we can train models more easily that generate higher quality samples, while retaining much of the sampling speed benefits. Notable results are 1.4 FID on Imagenet 64 in 8 step and 2.1 FID on Imagenet128 in 8 steps with consistency distillation, using simple losses without adversarial training. We also show that our method scales to a text-to-image diffusion model, generating samples that are close to the quality of the original model.

Machine Learning,Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem this paper attempts to address is that Diffusion Models require a large number of steps to generate samples, leading to high computational resource consumption and slow speed. Although Consistency Models can significantly reduce sampling time, they do so at the expense of image quality. Therefore, the paper proposes a Multistep Consistency Model, aiming to achieve a balance between sampling speed and quality by interpolating between Consistency and Diffusion Models. Specifically, the goals of the paper include: 1. **Improving generation quality**: By increasing the sampling steps (from 1 step to 2-8 steps), generating higher quality samples while maintaining high sampling speed. 2. **Simplifying training difficulty**: Compared to traditional single-step Consistency Models, Multistep Consistency Models are easier to train and can achieve performance close to standard Diffusion Models with fewer steps. 3. **Expanding application scope**: Demonstrating that this method is not only applicable to image generation tasks but can also be applied to text-to-image generation tasks, with the generated sample quality being close to the original model. The paper achieves these goals by introducing Multistep Consistency Models, combining consistency training and distillation techniques, and an improved deterministic sampler (Adjusted DDIM). Experimental results show that this method achieves significant performance improvements on the ImageNet64 and ImageNet128 datasets, with FID scores of 1.4 and 2.1 respectively at 8-step sampling. Additionally, the method also shows performance comparable to the teacher model in text-to-image generation tasks.

Multistep Consistency Models

Truncated Consistency Models

Improved Techniques for Training Consistency Models

Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models

Provable Statistical Rates for Consistency Diffusion Models

Towards a mathematical theory for consistency training in diffusion models

Consistency Models Made Easy

Consistency Diffusion Bridge Models

Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion

ACT-Diffusion: Efficient Adversarial Consistency Training for One-step Diffusion Models

Stable Consistency Tuning: Understanding and Improving Consistency Models

See Further When Clear: Curriculum Consistency Model

Multistep Distillation of Diffusion Models via Moment Matching

Convergence guarantee for consistency models

Inconsistencies In Consistency Models: Better ODE Solving Does Not Imply Better Samples

On the Equivalence of Consistency-Type Models: Consistency Models, Consistent Diffusion Models, and Fokker-Planck Regularization

Improving Consistency Models with Generator-Induced Flows

Towards Faster Training of Diffusion Models: An Inspiration of A Consistency Phenomenon

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

Bidirectional Consistency Models

CoSIGN: Few-Step Guidance of ConSIstency Model to Solve General INverse Problems