SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation

Teng Hu,Jiangning Zhang,Ran Yi,Hongrui Huang,Yabiao Wang,Lizhuang Ma
2024-09-11
Abstract:In recent years, the development of diffusion models has led to significant progress in image and video generation tasks, with pre-trained models like the Stable Diffusion series playing a crucial role. Inspired by model pruning which lightens large pre-trained models by removing unimportant parameters, we propose a novel model fine-tuning method to make full use of these ineffective parameters and enable the pre-trained model with new task-specified capabilities. In this work, we first investigate the importance of parameters in pre-trained diffusion models, and discover that the smallest 10% to 20% of parameters by absolute values do not contribute to the generation process. Based on this observation, we propose a method termed SaRA that re-utilizes these temporarily ineffective parameters, equating to optimizing a sparse weight matrix to learn the task-specific knowledge. To mitigate overfitting, we propose a nuclear-norm-based low-rank sparse training scheme for efficient fine-tuning. Furthermore, we design a new progressive parameter adjustment strategy to make full use of the re-trained/finetuned parameters. Finally, we propose a novel unstructural backpropagation strategy, which significantly reduces memory costs during fine-tuning. Our method enhances the generative capabilities of pre-trained models in downstream applications and outperforms traditional fine-tuning methods like LoRA in maintaining model's generalization ability. We validate our approach through fine-tuning experiments on SD models, demonstrating significant improvements. SaRA also offers a practical advantage that requires only a single line of code modification for efficient implementation and is seamlessly compatible with existing methods.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem this paper attempts to address is how to effectively fine-tune pre-trained diffusion models to adapt to new downstream tasks. Specifically, existing fine-tuning methods have deficiencies in terms of efficiency, performance, and generalization ability, such as: 1. **Additive Fine-Tuning (AFT)**: Fine-tuning the model by introducing additional modules, but this method changes the original model structure and increases inference latency. 2. **Reparameterization Fine-Tuning (RFT)**: Using low-rank matrices to learn new information, but there is a risk of overfitting, and specific layers and ranks need to be designed for each model. 3. **Selective Fine-Tuning (SFT)**: Selecting a portion of the model's parameters for fine-tuning, but the parameter selection process is complex and memory-intensive. To address these issues, the paper proposes a new fine-tuning method—SaRA (Sparse Low-Rank Adaptation), which improves the model's generative ability in downstream tasks while maintaining the model's generalization ability by reusing temporarily ineffective parameters in the pre-trained model. Specifically, the SaRA method includes the following key steps: 1. **Identifying Ineffective Parameters**: Identifying parameters with small absolute values in the pre-trained model by calculating a sparse mask, as these parameters have a minor impact on the current model output. 2. **Sparse Matrix Update**: Updating these ineffective parameters using the sparse mask while keeping other effective parameters unchanged. 3. **Low-Rank Constraint**: Introducing a nuclear norm-based low-rank constraint to prevent the rank of the sparse matrix from being too high, thereby avoiding overfitting. 4. **Progressive Parameter Adjustment**: Gradually reselecting and updating ineffective parameters through a progressive strategy to ensure full utilization of all parameters. 5. **Unstructured Backpropagation**: Proposing an unstructured backpropagation strategy to significantly reduce memory consumption and improve fine-tuning efficiency. Through these methods, SaRA performs excellently in multiple downstream tasks, including domain transfer, customized generation, image editing, and 3D generation, and is simple to implement, requiring only one line of code modification for efficient application.