Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

Simian Luo,Yiqin Tan,Longbo Huang,Jian Li,Hang Zhao

2023-10-07

Abstract:Latent Diffusion models (LDMs) have achieved remarkable results in synthesizing high-resolution images. However, the iterative sampling process is computationally intensive and leads to slow generation. Inspired by Consistency Models (song et al.), we propose Latent Consistency Models (LCMs), enabling swift inference with minimal steps on any pre-trained LDMs, including Stable Diffusion (rombach et al). Viewing the guided reverse diffusion process as solving an augmented probability flow ODE (PF-ODE), LCMs are designed to directly predict the solution of such ODE in latent space, mitigating the need for numerous iterations and allowing rapid, high-fidelity sampling. Efficiently distilled from pre-trained classifier-free guided diffusion models, a high-quality 768 x 768 2~4-step LCM takes only 32 A100 GPU hours for training. Furthermore, we introduce Latent Consistency Fine-tuning (LCF), a novel method that is tailored for fine-tuning LCMs on customized image datasets. Evaluation on the LAION-5B-Aesthetics dataset demonstrates that LCMs achieve state-of-the-art text-to-image generation performance with few-step inference. Project Page: <a class="link-external link-https" href="https://latent-consistency-models.github.io/" rel="external noopener nofollow">this https URL</a>

Computer Vision and Pattern Recognition,Machine Learning

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to address the issues of high computational cost and slow generation speed in the iterative sampling process of high-resolution image generation. Specifically, the paper proposes a new method—**Latent Consistency Models (LCMs)**—to accelerate the image generation speed based on pre-trained latent diffusion models (such as Stable Diffusion). #### Main Objectives: 1. **Fast generation of high-resolution images**: Achieve efficient and high-quality image generation by applying consistency models in the latent space of pre-trained latent diffusion models (such as Stable Diffusion). 2. **Reduce iterative steps**: Simplify the generation process, which originally required multiple iterations, to a few steps or even a single step, thereby significantly improving generation speed. 3. **Maintain image quality**: Ensure that the quality of the generated images does not degrade while accelerating the generation speed. 4. **Conditional generation tasks**: Optimize specifically for text-to-image tasks, ensuring that the generated images are highly consistent with the given text descriptions. #### Specific Contributions: - Proposed LCMs and demonstrated their advantages in high-resolution image generation. - Introduced a simple and efficient single-stage guided distillation method for distilling LCMs from pre-trained diffusion models. - Introduced the **SKIPPING-STEP technique** to accelerate the convergence of LCMs. - Proposed **Latent Consistency Fine-tuning (LCF)**, allowing fine-tuning of pre-trained LCMs to adapt to specific datasets. With these improvements, LCMs achieve state-of-the-art text-to-image generation performance on the LAION-5B-Aesthetics dataset, particularly excelling in scenarios with a small number of steps (2 to 4 steps).

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

TLCM: Training-efficient Latent Consistency Model for Image Generation with 2-8 Steps

VideoLCM: Video Latent Consistency Model

LCM-LoRA: A Universal Stable-Diffusion Acceleration Module

Reward Guided Latent Consistency Distillation

Phased Consistency Models

High-Resolution Image Synthesis with Latent Diffusion Models

AudioLCM: Text-to-Audio Generation with Latent Consistency Models

Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models

DreamLCM: Towards High-Quality Text-to-3D Generation via Latent Consistency Model

Leapfrog Latent Consistency Model (LLCM) for Medical Images Generation

Consistency^2: Consistent and Fast 3D Painting with Latent Consistency Models

AP-LDM: Attentive and Progressive Latent Diffusion Model for Training-Free High-Resolution Image Generation

Consistency Models Made Easy

RL for Consistency Models: Faster Reward Guided Text-to-Image Generation

Bidirectional Consistency Models

Explore In-Context Segmentation via Latent Diffusion Models

Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image Synthesis

AudioLCM: Efficient and High-Quality Text-to-Audio Generation with Minimal Inference Steps

ConsistNet: Enforcing 3D Consistency for Multi-view Images Diffusion