Abstract:Low-rank adaptation, also known as LoRA, has emerged as a prominent method for parameter-efficient fine-tuning of foundation models. Despite its computational efficiency, LoRA still yields inferior performance compared to full fine-tuning. In this paper, we first uncover a fundamental connection between the optimization processes of LoRA and full fine-tuning: using LoRA for optimization is mathematically equivalent to full fine-tuning using a low-rank gradient for parameter updates. And this low-rank gradient can be expressed in terms of the gradients of the two low-rank matrices in LoRA. Leveraging this insight, we introduce LoRA-Pro, a method that enhances LoRA's performance by strategically adjusting the gradients of these low-rank matrices. This adjustment allows the low-rank gradient to more accurately approximate the full fine-tuning gradient, thereby narrowing the performance gap between LoRA and full fine-tuning. Furthermore, we theoretically derive the optimal solutions for adjusting the gradients of the low-rank matrices, applying them during fine-tuning in LoRA-Pro. We conduct extensive experiments across natural language understanding, dialogue generation, mathematical reasoning, code generation, and image classification tasks, demonstrating that LoRA-Pro substantially improves LoRA's performance, effectively narrowing the gap with full fine-tuning. Code is publicly available at \url{<a class="link-external link-https" href="https://github.com/mrflogs/LoRA-Pro" rel="external noopener nofollow">this https URL</a>}.
What problem does this paper attempt to address?
### Problems Addressed by the Paper
This paper aims to address the performance deficiencies of Low-Rank Adapters (LoRA) in parameter-efficient fine-tuning. Although LoRA demonstrates excellent computational efficiency, its performance still falls short of Full Fine-Tuning. Specifically, the paper reveals the fundamental connection between the optimization process of LoRA and Full Fine-Tuning and proposes a new method—LoRA-Pro, which narrows the performance gap between LoRA and Full Fine-Tuning by adjusting the gradients of the low-rank matrices.
### Main Contributions
1. **Revealing the Connection in the Optimization Process**: The paper is the first to reveal the mathematical equivalence between LoRA optimization and Full Fine-Tuning in the optimization process, i.e., optimizing with LoRA is equivalent to Full Fine-Tuning with low-rank gradients.
2. **Proposing the LoRA-Pro Method**: By adjusting the gradients of the low-rank matrices, the method minimizes the difference between the low-rank gradients and the Full Fine-Tuning gradients, thereby improving the performance of LoRA.
3. **Theoretical Proof and Experimental Validation**: The paper provides a closed-form solution for the optimal gradients and validates the effectiveness of LoRA-Pro through extensive experiments, including tasks in natural language understanding, dialogue generation, mathematical reasoning, code generation, and image classification.
### Experimental Results
- **Natural Language Understanding Tasks**: On the GLUE benchmark dataset, LoRA-Pro achieved the highest average scores in multiple sub-tasks, significantly outperforming standard LoRA and other variants.
- **Large-Scale Language Model Tasks**: In dialogue generation, mathematical reasoning, and code generation tasks, LoRA-Pro performed excellently across multiple datasets, especially in high-rank settings, where its performance was close to or even surpassed Full Fine-Tuning.
- **Image Classification Tasks**: On multiple image classification datasets, LoRA-Pro's performance was significantly better than other methods, including zero-shot classification and standard LoRA.
### Conclusion
LoRA-Pro effectively narrows the performance gap between LoRA and Full Fine-Tuning by adjusting the gradients of the low-rank matrices, demonstrating superior performance across various tasks and models. These results validate the effectiveness and robustness of the LoRA-Pro method.