CutDiffusion: A Simple, Fast, Cheap, and Strong Diffusion Extrapolation Method

Mingbao Lin,Zhihang Lin,Wengyi Zhan,Liujuan Cao,Rongrong Ji
2024-04-23
Abstract:Transforming large pre-trained low-resolution diffusion models to cater to higher-resolution demands, i.e., diffusion extrapolation, significantly improves diffusion adaptability. We propose tuning-free CutDiffusion, aimed at simplifying and accelerating the diffusion extrapolation process, making it more affordable and improving performance. CutDiffusion abides by the existing patch-wise extrapolation but cuts a standard patch diffusion process into an initial phase focused on comprehensive structure denoising and a subsequent phase dedicated to specific detail refinement. Comprehensive experiments highlight the numerous almighty advantages of CutDiffusion: (1) simple method construction that enables a concise higher-resolution diffusion process without third-party engagement; (2) fast inference speed achieved through a single-step higher-resolution diffusion process, and fewer inference patches required; (3) cheap GPU cost resulting from patch-wise inference and fewer patches during the comprehensive structure denoising; (4) strong generation performance, stemming from the emphasis on specific detail refinement.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
This paper proposes a new method called CutDiffusion, aiming to simplify and accelerate the high-resolution extension process of the Diffusion Model, making it more cost-effective and improving the generation performance. The Diffusion Model is commonly used to generate detailed images from textual descriptions, but its adaptability is poor when higher resolution is required. CutDiffusion addresses this issue by dividing the standard patch diffusion process into two stages, first focusing on comprehensive structural denoising and then refining specific details. The advantages of CutDiffusion are as follows: 1. Simple construction method: No third-party involvement is required, achieving a concise high-resolution diffusion process. 2. Fast inference speed: Achieved through a single high-resolution diffusion process, reducing the number of required inference patches. 3. Cost-effective GPU cost: By adopting patch-wise inference and fewer patches for structural denoising, the GPU cost is reduced. 4. Powerful generation performance: Emphasizing the fine processing of specific details, improving the generation quality. Compared with existing methods, CutDiffusion achieves faster inference speed and lower GPU cost while maintaining high-quality image generation, without changing the parameters. The paper also demonstrates the comparisons between CutDiffusion and other methods, proving its advantages in different application scenarios.