AccDiffusion: An Accurate Method for Higher-Resolution Image Generation

Zhihang Lin,Mingbao Lin,Meng Zhao,Rongrong Ji

2024-07-18

Abstract:This paper attempts to address the object repetition issue in patch-wise higher-resolution image generation. We propose AccDiffusion, an accurate method for patch-wise higher-resolution image generation without training. An in-depth analysis in this paper reveals an identical text prompt for different patches causes repeated object generation, while no prompt compromises the image details. Therefore, our AccDiffusion, for the first time, proposes to decouple the vanilla image-content-aware prompt into a set of patch-content-aware prompts, each of which serves as a more precise description of an image patch. Besides, AccDiffusion also introduces dilated sampling with window interaction for better global consistency in higher-resolution image generation. Experimental comparison with existing methods demonstrates that our AccDiffusion effectively addresses the issue of repeated object generation and leads to better performance in higher-resolution image generation.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

This paper attempts to address the issue of object repetition in high-resolution image generation. Specifically, existing high-resolution image generation methods (such as MultiDiffusion and DemoFusion) tend to generate repeated objects because the same text prompt is applied to all image blocks during the generation process. Additionally, these methods suffer from poor global consistency when generating high-resolution images. To tackle these issues, the authors propose the AccDiffusion method, whose main innovations include: 1. **Decoupling Image Content-Aware Prompts**: Traditional image content-aware prompts are decomposed into multiple image block content-aware prompts, with each prompt more accurately describing the content of the corresponding image block. By utilizing cross-attention maps from the low-resolution generation process to determine whether each word should serve as a prompt for a specific image block, object repetition is avoided. 2. **Dilated Sampling with Window Interaction**: A new dilated sampling method is introduced, and window interaction is employed to enhance global consistency. Specifically, a bijective function is used to enable interaction between different samples during the denoising process, thereby generating smoother global semantic information. Experimental validation shows that AccDiffusion effectively addresses the issue of object repetition in high-resolution image generation and performs excellently in both quantitative and qualitative evaluations.

AccDiffusion: An Accurate Method for Higher-Resolution Image Generation

AccDiffusion v2: Towards More Accurate Higher-Resolution Diffusion Extrapolation

Patched Denoising Diffusion Models For High-Resolution Image Synthesis

PatchScaler: An Efficient Patch-Independent Diffusion Model for Image Super-Resolution

HiDiffusion: Unlocking Higher-Resolution Creativity and Efficiency in Pretrained Diffusion Models

Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models

HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts

PD-CR: Patch-Based Diffusion Using Constrained Refinement for Image Restoration

FAM Diffusion: Frequency and Attention Modulation for High-Resolution Image Generation with Stable Diffusion

Correcting Diffusion Generation through Resampling

DiffuseHigh: Training-free Progressive High-Resolution Image Synthesis through Structure Guidance

TextureDiffusion: Target Prompt Disentangled Editing for Various Texture Transfer

Diffusion Models Without Attention

Counting Guidance for High Fidelity Text-to-Image Synthesis

RecDiffusion: Rectangling for Image Stitching with Diffusion Models

ASGDiffusion: Parallel High-Resolution Generation with Asynchronous Structure Guidance

SpotDiffusion: A Fast Approach For Seamless Panorama Generation Over Time

CutDiffusion: A Simple, Fast, Cheap, and Strong Diffusion Extrapolation Method

ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance

High-Resolution Image Editing via Multi-Stage Blended Diffusion

Text-Guided Texturing by Synchronized Multi-View Diffusion