Diffusion Tuning: Transferring Diffusion Models via Chain of Forgetting

Jincheng Zhong,Xingzhuo Guo,Jiaxiang Dong,Mingsheng Long

2024-06-06

Abstract:Diffusion models have significantly advanced the field of generative modeling. However, training a diffusion model is computationally expensive, creating a pressing need to adapt off-the-shelf diffusion models for downstream generation tasks. Current fine-tuning methods focus on parameter-efficient transfer learning but overlook the fundamental transfer characteristics of diffusion models. In this paper, we investigate the transferability of diffusion models and observe a monotonous chain of forgetting trend of transferability along the reverse process. Based on this observation and novel theoretical insights, we present Diff-Tuning, a frustratingly simple transfer approach that leverages the chain of forgetting tendency. Diff-Tuning encourages the fine-tuned model to retain the pre-trained knowledge at the end of the denoising chain close to the generated data while discarding the other noise side. We conduct comprehensive experiments to evaluate Diff-Tuning, including the transfer of pre-trained Diffusion Transformer models to eight downstream generations and the adaptation of Stable Diffusion to five control conditions with ControlNet. Diff-Tuning achieves a 26% improvement over standard fine-tuning and enhances the convergence speed of ControlNet by 24%. Notably, parameter-efficient transfer learning techniques for diffusion models can also benefit from Diff-Tuning.

Machine Learning,Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper aims to address the fine-tuning problem of diffusion models in the field of generative models. Specifically, training a brand-new diffusion model requires a significant amount of computational resources, so how to effectively adapt a pre-trained diffusion model to specific downstream tasks has become a key issue. Current fine-tuning methods mainly focus on parameter-efficient transfer learning but overlook the transfer characteristics of the diffusion model itself. Through research, the authors discovered a monotonic chain of forgetting trend in the reverse process of diffusion models and proposed the Diff-Tuning method based on this observation. Diff-Tuning is a simple and effective fine-tuning method that leverages the trend of the chain of forgetting, encouraging the fine-tuned model to retain pre-trained knowledge at the end of the denoising process while discarding other noisy parts. This method has shown significantly better performance than standard fine-tuning in multiple experiments, particularly in tasks of conditional generation and controllable generation using ControlNet, with performance improvements of 26% and 24%, respectively. Additionally, Diff-Tuning can enhance the performance of existing parameter-efficient transfer learning techniques. Overall, the paper demonstrates the effectiveness of its method through theoretical analysis and experiments, showcasing superior performance on various datasets.

Diffusion Tuning: Transferring Diffusion Models via Chain of Forgetting

Transfer Learning for Diffusion Models

Memory-Efficient Fine-Tuning for Quantized Diffusion Model

ReDiFine: Reusable Diffusion Finetuning for Mitigating Degradation in the Chain of Diffusion

Addressing Negative Transfer in Diffusion Models

Decouple-Then-Merge: Towards Better Training for Diffusion Models

Model Collapse in the Self-Consuming Chain of Diffusion Finetuning: A Novel Perspective from Quantitative Trait Modeling

Improving Efficiency of Diffusion Models via Multi-Stage Framework and Tailored Multi-Decoder Architectures

A Closer Look at Parameter-Efficient Tuning in Diffusion Models.

DiffNAS: Bootstrapping Diffusion Models by Prompting for Better Architectures

Transferring Pretrained Diffusion Probabilistic Models

Do We Need to Design Specific Diffusion Models for Different Tasks? Try ONE-PIC

DiffNAS: Bootstrapping Diffusion Models by Prompting for Better Architectures

FRDiff : Feature Reuse for Universal Training-free Acceleration of Diffusion Models

Towards Faster Training of Diffusion Models: An Inspiration of A Consistency Phenomenon

Token Caching for Diffusion Transformer Acceleration

Expanding Expressiveness of Diffusion Models with Limited Data via Self-Distillation based Fine-Tuning

Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts

Diff-Instruct: A Universal Approach for Transferring Knowledge From Pre-trained Diffusion Models

TinyFusion: Diffusion Transformers Learned Shallow

Your Diffusion Model is Secretly a Noise Classifier and Benefits from Contrastive Training