FreeU: Free Lunch in Diffusion U-Net

Chenyang Si,Ziqi Huang,Yuming Jiang,Ziwei Liu

2023-10-18

Abstract:In this paper, we uncover the untapped potential of diffusion U-Net, which serves as a "free lunch" that substantially improves the generation quality on the fly. We initially investigate the key contributions of the U-Net architecture to the denoising process and identify that its main backbone primarily contributes to denoising, whereas its skip connections mainly introduce high-frequency features into the decoder module, causing the network to overlook the backbone semantics. Capitalizing on this discovery, we propose a simple yet effective method-termed "FreeU" - that enhances generation quality without additional training or finetuning. Our key insight is to strategically re-weight the contributions sourced from the U-Net's skip connections and backbone feature maps, to leverage the strengths of both components of the U-Net architecture. Promising results on image and video generation tasks demonstrate that our FreeU can be readily integrated to existing diffusion models, e.g., Stable Diffusion, DreamBooth, ModelScope, Rerender and ReVersion, to improve the generation quality with only a few lines of code. All you need is to adjust two scaling factors during inference. Project page: https://chenyangsi.top/FreeU/.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to explore and improve the performance of the U-Net architecture in diffusion models (such as Denoising Diffusion Probabilistic Models, DDPM) during the denoising process. Specifically, the authors found: 1. **Contribution differences of the U-Net architecture**: - The backbone network is primarily responsible for the denoising task. - Skip connections mainly introduce high-frequency features into the decoder module, which may cause the network to ignore the semantic information of the backbone network. 2. **Proposed method**: - The paper proposes a method called "FreeU," which enhances the generation quality by reweighting the skip connections and backbone feature maps in the U-Net architecture, without requiring additional training or fine-tuning. 3. **Objective**: - To improve the quality of image and video generation tasks, enabling existing diffusion models (such as Stable Diffusion, DreamBooth, etc.) to significantly enhance generation effects at a very low cost. Through these improvements, the paper demonstrates that the FreeU framework can significantly enhance the quality of generated samples in various tasks, including text-to-image, text-to-video, and downstream tasks (such as personalized image generation). Experimental results show that FreeU can significantly improve the details and overall quality of generated samples without adding extra computational burden.

FreeU: Free Lunch in Diffusion U-Net

Faster Diffusion: Rethinking the Role of the Encoder for Diffusion Model Inference

FreeInit: Bridging Initialization Gap in Video Diffusion Models

Spatial-Frequency U-Net for Denoising Diffusion Probabilistic Models

FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process

The Missing U for Efficient Diffusion Models

UniFL: Improve Latent Diffusion Model via Unified Feedback Learning

LinFusion: 1 GPU, 1 Minute, 16K Image

U-Nets as Belief Propagation: Efficient Classification, Denoising, and Diffusion in Generative Hierarchical Models

Flexiffusion: Segment-wise Neural Architecture Search for Flexible Denoising Schedule

HiDiffusion: Unlocking Higher-Resolution Creativity and Efficiency in Pretrained Diffusion Models

Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution

DeepCache: Accelerating Diffusion Models for Free

SNED: Superposition Network Architecture Search for Efficient Video Diffusion Model

Factorized Diffusion Architectures for Unsupervised Image Generation and Segmentation

Ensembling Diffusion Models via Adaptive Feature Aggregation

All are Worth Words: A ViT Backbone for Diffusion Models

FRDiff : Feature Reuse for Universal Training-free Acceleration of Diffusion Models

DDRF: Denoising Diffusion Model for Remote Sensing Image Fusion

Improving Efficiency of Diffusion Models via Multi-Stage Framework and Tailored Multi-Decoder Architectures

Dual-Stream Diffusion Net for Text-to-Video Generation