DiffuseHigh: Training-free Progressive High-Resolution Image Synthesis through Structure Guidance

Younghyun Kim,Geunmin Hwang,Junyu Zhang,Eunbyung Park
2024-08-27
Abstract:Large-scale generative models, such as text-to-image diffusion models, have garnered widespread attention across diverse domains due to their creative and high-fidelity image generation. Nonetheless, existing large-scale diffusion models are confined to generating images of up to 1K resolution, which is far from meeting the demands of contemporary commercial applications. Directly sampling higher-resolution images often yields results marred by artifacts such as object repetition and distorted shapes. Addressing the aforementioned issues typically necessitates training or fine-tuning models on higher-resolution datasets. However, this poses a formidable challenge due to the difficulty in collecting large-scale high-resolution images and substantial computational resources. While several preceding works have proposed alternatives to bypass the cumbersome training process, they often fail to produce convincing results. In this work, we probe the generative ability of diffusion models at higher resolution beyond their original capability and propose a novel progressive approach that fully utilizes generated low-resolution images to guide the generation of higher-resolution images. Our method obviates the need for additional training or fine-tuning which significantly lowers the burden of computational costs. Extensive experiments and results validate the efficiency and efficacy of our method. Project page: <a class="link-external link-https" href="https://yhyun225.github.io/DiffuseHigh/" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address the problem of high-resolution image generation. Specifically, it attempts to overcome the limitations encountered by existing large-scale diffusion models (such as text-to-image diffusion models) when generating high-resolution images beyond their training resolution. Directly generating higher resolution images from pre-trained models often results in issues like repetitive patterns and distorted shapes. Moreover, retraining or fine-tuning the model with higher resolution datasets not only requires a large amount of high-quality image data but also consumes enormous computational resources. The paper proposes a new method called DiffuseHigh, which can generate high-resolution images without additional training or fine-tuning. It generates higher resolution images by using low-resolution images as structural guidance, fully leveraging existing low-resolution images to guide the high-resolution image generation process. Specifically, the paper employs Discrete Wavelet Transform (DWT) to extract global structural information from low-resolution images and uses it during the denoising process to enhance the structural attributes and details of the generated samples. Additionally, to improve the quality of the generated images, the paper introduces a sharpening operation to reduce the blurring effect caused by interpolation. Experimental results show that DiffuseHigh outperforms other training-free methods in image generation tasks at different resolutions, with significant advantages in both generation time and image quality. Overall, the paper provides an efficient and effective solution that enables pre-trained diffusion models to generate high-quality high-resolution images without modifying weights or architecture.