Abstract:Large-scale generative models, such as text-to-image diffusion models, have garnered widespread attention across diverse domains due to their creative and high-fidelity image generation. Nonetheless, existing large-scale diffusion models are confined to generating images of up to 1K resolution, which is far from meeting the demands of contemporary commercial applications. Directly sampling higher-resolution images often yields results marred by artifacts such as object repetition and distorted shapes. Addressing the aforementioned issues typically necessitates training or fine-tuning models on higher-resolution datasets. However, this poses a formidable challenge due to the difficulty in collecting large-scale high-resolution images and substantial computational resources. While several preceding works have proposed alternatives to bypass the cumbersome training process, they often fail to produce convincing results. In this work, we probe the generative ability of diffusion models at higher resolution beyond their original capability and propose a novel progressive approach that fully utilizes generated low-resolution images to guide the generation of higher-resolution images. Our method obviates the need for additional training or fine-tuning which significantly lowers the burden of computational costs. Extensive experiments and results validate the efficiency and efficacy of our method. Project page: <a class="link-external link-https" href="https://yhyun225.github.io/DiffuseHigh/" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

The paper aims to address the problem of high-resolution image generation. Specifically, it attempts to overcome the limitations encountered by existing large-scale diffusion models (such as text-to-image diffusion models) when generating high-resolution images beyond their training resolution. Directly generating higher resolution images from pre-trained models often results in issues like repetitive patterns and distorted shapes. Moreover, retraining or fine-tuning the model with higher resolution datasets not only requires a large amount of high-quality image data but also consumes enormous computational resources. The paper proposes a new method called DiffuseHigh, which can generate high-resolution images without additional training or fine-tuning. It generates higher resolution images by using low-resolution images as structural guidance, fully leveraging existing low-resolution images to guide the high-resolution image generation process. Specifically, the paper employs Discrete Wavelet Transform (DWT) to extract global structural information from low-resolution images and uses it during the denoising process to enhance the structural attributes and details of the generated samples. Additionally, to improve the quality of the generated images, the paper introduces a sharpening operation to reduce the blurring effect caused by interpolation. Experimental results show that DiffuseHigh outperforms other training-free methods in image generation tasks at different resolutions, with significant advantages in both generation time and image quality. Overall, the paper provides an efficient and effective solution that enables pre-trained diffusion models to generate high-quality high-resolution images without modifying weights or architecture.

DiffuseHigh: Training-free Progressive High-Resolution Image Synthesis through Structure Guidance

High-Resolution Image Editing via Multi-Stage Blended Diffusion

BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion

MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning

Novel 3D-Aware Composition Images Synthesis for Object Display with Diffusion Model.

Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models

ASGDiffusion: Parallel High-Resolution Generation with Asynchronous Structure Guidance

HiDiffusion: Unlocking Higher-Resolution Creativity and Efficiency in Pretrained Diffusion Models

ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance

Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation

One-step Generative Diffusion for Realistic Extreme Image Rescaling

One Diffusion to Generate Them All

Upsample Guidance: Scale Up Diffusion Models without Training

FineDiffusion: Scaling up Diffusion Models for Fine-grained Image Generation with 10,000 Classes

FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis

Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation

Accelerated Image-Aware Generative Diffusion Modeling

High-Resolution Image Synthesis with Latent Diffusion Models

Matryoshka Diffusion Models

Efficient image generation with Contour Wavelet Diffusion

Training-free Diffusion Model Adaptation for Variable-Sized Text-to-Image Synthesis