AccDiffusion: An Accurate Method for Higher-Resolution Image Generation

Zhihang Lin,Mingbao Lin,Meng Zhao,Rongrong Ji
2024-07-18
Abstract:This paper attempts to address the object repetition issue in patch-wise higher-resolution image generation. We propose AccDiffusion, an accurate method for patch-wise higher-resolution image generation without training. An in-depth analysis in this paper reveals an identical text prompt for different patches causes repeated object generation, while no prompt compromises the image details. Therefore, our AccDiffusion, for the first time, proposes to decouple the vanilla image-content-aware prompt into a set of patch-content-aware prompts, each of which serves as a more precise description of an image patch. Besides, AccDiffusion also introduces dilated sampling with window interaction for better global consistency in higher-resolution image generation. Experimental comparison with existing methods demonstrates that our AccDiffusion effectively addresses the issue of repeated object generation and leads to better performance in higher-resolution image generation.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to address the issue of object repetition in high-resolution image generation. Specifically, existing high-resolution image generation methods (such as MultiDiffusion and DemoFusion) tend to generate repeated objects because the same text prompt is applied to all image blocks during the generation process. Additionally, these methods suffer from poor global consistency when generating high-resolution images. To tackle these issues, the authors propose the AccDiffusion method, whose main innovations include: 1. **Decoupling Image Content-Aware Prompts**: Traditional image content-aware prompts are decomposed into multiple image block content-aware prompts, with each prompt more accurately describing the content of the corresponding image block. By utilizing cross-attention maps from the low-resolution generation process to determine whether each word should serve as a prompt for a specific image block, object repetition is avoided. 2. **Dilated Sampling with Window Interaction**: A new dilated sampling method is introduced, and window interaction is employed to enhance global consistency. Specifically, a bijective function is used to enable interaction between different samples during the denoising process, thereby generating smoother global semantic information. Experimental validation shows that AccDiffusion effectively addresses the issue of object repetition in high-resolution image generation and performs excellently in both quantitative and qualitative evaluations.