Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

Simian Luo,Yiqin Tan,Longbo Huang,Jian Li,Hang Zhao
2023-10-07
Abstract:Latent Diffusion models (LDMs) have achieved remarkable results in synthesizing high-resolution images. However, the iterative sampling process is computationally intensive and leads to slow generation. Inspired by Consistency Models (song et al.), we propose Latent Consistency Models (LCMs), enabling swift inference with minimal steps on any pre-trained LDMs, including Stable Diffusion (rombach et al). Viewing the guided reverse diffusion process as solving an augmented probability flow ODE (PF-ODE), LCMs are designed to directly predict the solution of such ODE in latent space, mitigating the need for numerous iterations and allowing rapid, high-fidelity sampling. Efficiently distilled from pre-trained classifier-free guided diffusion models, a high-quality 768 x 768 2~4-step LCM takes only 32 A100 GPU hours for training. Furthermore, we introduce Latent Consistency Fine-tuning (LCF), a novel method that is tailored for fine-tuning LCMs on customized image datasets. Evaluation on the LAION-5B-Aesthetics dataset demonstrates that LCMs achieve state-of-the-art text-to-image generation performance with few-step inference. Project Page: <a class="link-external link-https" href="https://latent-consistency-models.github.io/" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to address the issues of high computational cost and slow generation speed in the iterative sampling process of high-resolution image generation. Specifically, the paper proposes a new method—**Latent Consistency Models (LCMs)**—to accelerate the image generation speed based on pre-trained latent diffusion models (such as Stable Diffusion). #### Main Objectives: 1. **Fast generation of high-resolution images**: Achieve efficient and high-quality image generation by applying consistency models in the latent space of pre-trained latent diffusion models (such as Stable Diffusion). 2. **Reduce iterative steps**: Simplify the generation process, which originally required multiple iterations, to a few steps or even a single step, thereby significantly improving generation speed. 3. **Maintain image quality**: Ensure that the quality of the generated images does not degrade while accelerating the generation speed. 4. **Conditional generation tasks**: Optimize specifically for text-to-image tasks, ensuring that the generated images are highly consistent with the given text descriptions. #### Specific Contributions: - Proposed LCMs and demonstrated their advantages in high-resolution image generation. - Introduced a simple and efficient single-stage guided distillation method for distilling LCMs from pre-trained diffusion models. - Introduced the **SKIPPING-STEP technique** to accelerate the convergence of LCMs. - Proposed **Latent Consistency Fine-tuning (LCF)**, allowing fine-tuning of pre-trained LCMs to adapt to specific datasets. With these improvements, LCMs achieve state-of-the-art text-to-image generation performance on the LAION-5B-Aesthetics dataset, particularly excelling in scenarios with a small number of steps (2 to 4 steps).