Abstract:Latent diffusion models (LDMs), such as Stable Diffusion, often experience significant structural distortions when directly generating high-resolution (HR) images that exceed their original training resolutions. A straightforward and cost-effective solution is to adapt pre-trained LDMs for HR image generation; however, existing methods often suffer from poor image quality and long inference time. In this paper, we propose an Attentive and Progressive LDM (AP-LDM), a novel, training-free framework aimed at enhancing HR image quality while accelerating the generation process. AP-LDM decomposes the denoising process of LDMs into two stages: (i) attentive training-resolution denoising, and (ii) progressive high-resolution denoising. The first stage generates a latent representation of a higher-quality training-resolution image through the proposed attentive guidance, which utilizes a novel parameter-free self-attention mechanism to enhance the structural consistency. The second stage progressively performs upsampling in pixel space, alleviating the severe artifacts caused by latent space upsampling. Leveraging the effective initialization from the first stage enables denoising at higher resolutions with significantly fewer steps, enhancing overall efficiency. Extensive experimental results demonstrate that AP-LDM significantly outperforms state-of-the-art methods, delivering up to a 5x speedup in HR image generation, thereby highlighting its substantial advantages for real-world applications. Code is available at <a class="link-external link-https" href="https://github.com/kmittle/AP-LDM" rel="external noopener nofollow">this https URL</a>.

Pixel-Space Post-Training of Latent Diffusion Models

High-Resolution Image Synthesis with Latent Diffusion Models

Boosting Latent Diffusion with Perceptual Objectives

AP-LDM: Attentive and Progressive Latent Diffusion Model for Training-Free High-Resolution Image Generation

Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models

On Improved Conditioning Mechanisms and Pre-training Strategies for Diffusion Models

L 2 DM: A Diffusion Model for Low-Light Image Enhancement.

Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models

Diffusion Models Without Attention

Unsupervised Region-Based Image Editing of Denoising Diffusion Models

SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions

Towards Accurate Post-training Quantization for Diffusion Models

LDM-ISP: Enhancing Neural ISP for Low Light with Latent Diffusion Models

TempDiff: Enhancing Temporal‐awareness in Latent Diffusion for Real‐World Video Super‐Resolution

Effective Diffusion Transformer Architecture for Image Super-Resolution

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

LaDiffGAN: Training GANs with Diffusion Supervision in Latent Spaces

Explore In-Context Segmentation via Latent Diffusion Models

LD-ZNet: A Latent Diffusion Approach for Text-Based Image Segmentation

FasterDiT: Towards Faster Diffusion Transformers Training without Architecture Modification