DoRA: Enhancing Parameter-Efficient Fine-Tuning with Dynamic Rank Distribution

Yulong Mao,Kaiyu Huang,Changhao Guan,Ganglin Bao,Fengran Mo,Jinan Xu
2024-06-26
Abstract:Fine-tuning large-scale pre-trained models is inherently a resource-intensive task. While it can enhance the capabilities of the model, it also incurs substantial computational costs, posing challenges to the practical application of downstream tasks. Existing parameter-efficient fine-tuning (PEFT) methods such as Low-Rank Adaptation (LoRA) rely on a bypass framework that ignores the differential parameter budget requirements across weight matrices, which may lead to suboptimal fine-tuning outcomes. To address this issue, we introduce the Dynamic Low-Rank Adaptation (DoRA) method. DoRA decomposes high-rank LoRA layers into structured single-rank components, allowing for dynamic pruning of parameter budget based on their importance to specific tasks during training, which makes the most of the limited parameter budget. Experimental results demonstrate that DoRA can achieve competitive performance compared with LoRA and full model fine-tuning, and outperform various strong baselines with the same storage parameter budget. Our code is available at <a class="link-external link-https" href="https://github.com/MIkumikumi0116/DoRA" rel="external noopener nofollow">this https URL</a>
Computation and Language
What problem does this paper attempt to address?
This paper proposes a method called Dynamic Low-Rank Adaptation (DoRA) to address the problem of resource consumption when fine-tuning large-scale pretrained language models. Existing parameter-efficient fine-tuning (PEFT) methods, such as Low-Rank Adaptation (LoRA), introduce low-rank trainable matrices in the weight matrices using a bypass framework. However, this method overlooks the different requirements of parameter budget for different weight matrices, which may lead to suboptimal optimization. DoRA decomposes the high-rank LoRA layer into structured single-rank components and dynamically prunes the parameter budget for specific tasks based on their contributions during the training process, thereby effectively utilizing the limited parameter budget. The main contributions of the paper are as follows: 1. Introduce the DoRA method, which outperforms full model fine-tuning while using less than 0.3% of the training parameters. 2. DoRA can effectively identify the crucial modules in the pretrained models for fine-tuning tasks and allocate larger parameter budgets to these key modules. 3. Under the same storage parameter budget, DoRA surpasses various baseline methods on multiple downstream tasks. Compared to methods like LoRA and AdaLoRA, DoRA innovatively decomposes the high-rank LoRA layer into single-rank components and dynamically allocates parameters based on the importance of the components. Additionally, it introduces a Dimension Equilibrium Modulator (DEM) loss to avoid stability issues caused by pruning. Experimental results demonstrate that DoRA performs remarkably well on natural language understanding, question answering, and text generation tasks, especially with limited parameter budgets. Its performance is superior to other baseline methods, showcasing its efficiency in resource-constrained scenarios.