FreeU: Free Lunch in Diffusion U-Net

Chenyang Si,Ziqi Huang,Yuming Jiang,Ziwei Liu
2023-10-18
Abstract:In this paper, we uncover the untapped potential of diffusion U-Net, which serves as a "free lunch" that substantially improves the generation quality on the fly. We initially investigate the key contributions of the U-Net architecture to the denoising process and identify that its main backbone primarily contributes to denoising, whereas its skip connections mainly introduce high-frequency features into the decoder module, causing the network to overlook the backbone semantics. Capitalizing on this discovery, we propose a simple yet effective method-termed "FreeU" - that enhances generation quality without additional training or finetuning. Our key insight is to strategically re-weight the contributions sourced from the U-Net's skip connections and backbone feature maps, to leverage the strengths of both components of the U-Net architecture. Promising results on image and video generation tasks demonstrate that our FreeU can be readily integrated to existing diffusion models, e.g., Stable Diffusion, DreamBooth, ModelScope, Rerender and ReVersion, to improve the generation quality with only a few lines of code. All you need is to adjust two scaling factors during inference. Project page: https://chenyangsi.top/FreeU/.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to explore and improve the performance of the U-Net architecture in diffusion models (such as Denoising Diffusion Probabilistic Models, DDPM) during the denoising process. Specifically, the authors found: 1. **Contribution differences of the U-Net architecture**: - The backbone network is primarily responsible for the denoising task. - Skip connections mainly introduce high-frequency features into the decoder module, which may cause the network to ignore the semantic information of the backbone network. 2. **Proposed method**: - The paper proposes a method called "FreeU," which enhances the generation quality by reweighting the skip connections and backbone feature maps in the U-Net architecture, without requiring additional training or fine-tuning. 3. **Objective**: - To improve the quality of image and video generation tasks, enabling existing diffusion models (such as Stable Diffusion, DreamBooth, etc.) to significantly enhance generation effects at a very low cost. Through these improvements, the paper demonstrates that the FreeU framework can significantly enhance the quality of generated samples in various tasks, including text-to-image, text-to-video, and downstream tasks (such as personalized image generation). Experimental results show that FreeU can significantly improve the details and overall quality of generated samples without adding extra computational burden.