Abstract:Diffusion models achieve superior generation quality but suffer from slow generation speed due to the iterative nature of denoising. In contrast, consistency models, a new generative family, achieve competitive performance with significantly faster sampling. These models are trained either through consistency distillation, which leverages pretrained diffusion models, or consistency training/tuning directly from raw data. In this work, we propose a novel framework for understanding consistency models by modeling the denoising process of the diffusion model as a Markov Decision Process (MDP) and framing consistency model training as the value estimation through Temporal Difference~(TD) Learning. More importantly, this framework allows us to analyze the limitations of current consistency training/tuning strategies. Built upon Easy Consistency Tuning (ECT), we propose Stable Consistency Tuning (SCT), which incorporates variance-reduced learning using the score identity. SCT leads to significant performance improvements on benchmarks such as CIFAR-10 and ImageNet-64. On ImageNet-64, SCT achieves 1-step FID 2.42 and 2-step FID 1.55, a new SoTA for consistency models.

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve The main problem this paper attempts to solve is to improve the performance and training stability of Consistency Models. Specifically: 1. **Trade-off between generation speed and quality**: - Diffusion Models excel in generation quality but have slower generation speeds due to their iterative denoising nature. - Consistency Models can significantly speed up generation while maintaining high generation quality. However, existing training methods have some limitations, such as unstable training and lower performance ceilings. 2. **Improving training stability and performance**: - Existing training methods for Consistency Models (such as Consistency Distillation and Consistency Training/Tuning) suffer from high training variance and discretization errors, leading to unstable training and limited performance. - The paper proposes a new framework that models the denoising process of Diffusion Models as a Markov Decision Process (MDP) and views the training of Consistency Models as Temporal Difference Learning (TD Learning) to understand and improve the training strategies of Consistency Models. 3. **Optimization of multi-step generation**: - To further improve generation quality and stability, the paper explores multi-step training and inference methods for Consistency Models, including an edge-skipping multi-step inference strategy to overcome optimization challenges at edge time steps. ### Main Contributions 1. **Proposing Stable Consistency Tuning (SCT)**: - By introducing a variance-reduced learning objective and a smoother progressive training schedule, SCT significantly improves performance in benchmark tests, setting new records such as 1-step FID 2.42 and 2-step FID 1.55 on ImageNet-64. 2. **Improvements in multi-step generation**: - Proposes an edge-skipping multi-step inference strategy to address optimization challenges at edge time steps in multi-step Consistency Models, further enhancing the fidelity and stability of generated results. 3. **Validation of the effectiveness of unconditional guidance**: - Demonstrates that unconditional Consistency Models can be guided by their suboptimal versions to improve sample quality. ### Experimental Results - **Training efficiency and effectiveness**: - SCT significantly improves convergence speed compared to ECT under the same training configuration and shows better performance in multiple benchmark tests. - **Quantitative evaluation**: - On datasets like CIFAR-10 and ImageNet-64, SCT's one-step and two-step FID scores outperform ECT and are close to or exceed the performance of some advanced distillation strategies and diffusion/score models. Through these improvements, the paper provides new perspectives and methods for the training and application of Consistency Models, further advancing the development of generative models.

Stable Consistency Tuning: Understanding and Improving Consistency Models

Improved Techniques for Training Consistency Models

Truncated Consistency Models

ACT-Diffusion: Efficient Adversarial Consistency Training for One-step Diffusion Models

Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models

Consistency Models Made Easy

Chasing Consistency in Text-to-3D Generation from a Single Image.

Enhancing Consistency-Based Image Generation via Adversarialy-Trained Classification and Energy-Based Discrimination

Multistep Consistency Models

Poisson flow consistency models for low-dose CT image denoising

See Further When Clear: Curriculum Consistency Model

ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation

Direct Consistency Optimization for Robust Customization of Text-to-Image Diffusion Models

Improved Consistency Regularization for GANs

Consistency Diffusion Bridge Models

Towards Faster Training of Diffusion Models: An Inspiration of A Consistency Phenomenon

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

Bidirectional Consistency Models

Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better

Trajectory Consistency Distillation: Improved Latent Consistency Distillation by Semi-Linear Consistency Function with Trajectory Mapping

Consistency Purification: Effective and Efficient Diffusion Purification towards Certified Robustness