Abstract:Low-rank adaptation, also known as LoRA, has emerged as a prominent method for parameter-efficient fine-tuning of foundation models. Despite its computational efficiency, LoRA still yields inferior performance compared to full fine-tuning. In this paper, we first uncover a fundamental connection between the optimization processes of LoRA and full fine-tuning: using LoRA for optimization is mathematically equivalent to full fine-tuning using a low-rank gradient for parameter updates. And this low-rank gradient can be expressed in terms of the gradients of the two low-rank matrices in LoRA. Leveraging this insight, we introduce LoRA-Pro, a method that enhances LoRA's performance by strategically adjusting the gradients of these low-rank matrices. This adjustment allows the low-rank gradient to more accurately approximate the full fine-tuning gradient, thereby narrowing the performance gap between LoRA and full fine-tuning. Furthermore, we theoretically derive the optimal solutions for adjusting the gradients of the low-rank matrices, applying them during fine-tuning in LoRA-Pro. We conduct extensive experiments across natural language understanding, dialogue generation, mathematical reasoning, code generation, and image classification tasks, demonstrating that LoRA-Pro substantially improves LoRA's performance, effectively narrowing the gap with full fine-tuning. Code is publicly available at \url{<a class="link-external link-https" href="https://github.com/mrflogs/LoRA-Pro" rel="external noopener nofollow">this https URL</a>}.

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper aims to address the performance deficiencies of Low-Rank Adapters (LoRA) in parameter-efficient fine-tuning. Although LoRA demonstrates excellent computational efficiency, its performance still falls short of Full Fine-Tuning. Specifically, the paper reveals the fundamental connection between the optimization process of LoRA and Full Fine-Tuning and proposes a new method—LoRA-Pro, which narrows the performance gap between LoRA and Full Fine-Tuning by adjusting the gradients of the low-rank matrices. ### Main Contributions 1. **Revealing the Connection in the Optimization Process**: The paper is the first to reveal the mathematical equivalence between LoRA optimization and Full Fine-Tuning in the optimization process, i.e., optimizing with LoRA is equivalent to Full Fine-Tuning with low-rank gradients. 2. **Proposing the LoRA-Pro Method**: By adjusting the gradients of the low-rank matrices, the method minimizes the difference between the low-rank gradients and the Full Fine-Tuning gradients, thereby improving the performance of LoRA. 3. **Theoretical Proof and Experimental Validation**: The paper provides a closed-form solution for the optimal gradients and validates the effectiveness of LoRA-Pro through extensive experiments, including tasks in natural language understanding, dialogue generation, mathematical reasoning, code generation, and image classification. ### Experimental Results - **Natural Language Understanding Tasks**: On the GLUE benchmark dataset, LoRA-Pro achieved the highest average scores in multiple sub-tasks, significantly outperforming standard LoRA and other variants. - **Large-Scale Language Model Tasks**: In dialogue generation, mathematical reasoning, and code generation tasks, LoRA-Pro performed excellently across multiple datasets, especially in high-rank settings, where its performance was close to or even surpassed Full Fine-Tuning. - **Image Classification Tasks**: On multiple image classification datasets, LoRA-Pro's performance was significantly better than other methods, including zero-shot classification and standard LoRA. ### Conclusion LoRA-Pro effectively narrows the performance gap between LoRA and Full Fine-Tuning by adjusting the gradients of the low-rank matrices, demonstrating superior performance across various tasks and models. These results validate the effectiveness and robustness of the LoRA-Pro method.

LoRA-Pro: Are Low-Rank Adapters Properly Optimized?

LoRA-GA: Low-Rank Adaptation with Gradient Approximation

LoRA Learns Less and Forgets Less

Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs

LoRA+: Efficient Low Rank Adaptation of Large Models

Riemannian Preconditioned LoRA for Fine-Tuning Foundation Models

Matrix-Transformation Based Low-Rank Adaptation (MTLoRA): A Brain-Inspired Method for Parameter-Efficient Fine-Tuning

The Expressive Power of Low-Rank Adaptation

PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation

Randomized Asymmetric Chain of LoRA: The First Meaningful Theoretical Framework for Low-Rank Adaptation

GeoLoRA: Geometric integration for parameter efficient fine-tuning

AutoLoRA: Automatically Tuning Matrix Ranks in Low-Rank Adaptation Based on Meta Learning

LoRA$^2$ : Multi-Scale Low-Rank Approximations for Fine-Tuning Large Language Models

ALoRA: Allocating Low-Rank Adaptation for Fine-tuning Large Language Models

CoRA: Optimizing Low-Rank Adaptation with Common Subspace of Large Language Models

PeriodicLoRA: Breaking the Low-Rank Bottleneck in LoRA Optimization

LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization

SwitchLoRA: Switched Low-Rank Adaptation Can Learn Full-Rank Information

LoRTA: Low Rank Tensor Adaptation of Large Language Models

Delta-LoRA: Fine-Tuning High-Rank Parameters with the Delta of Low-Rank Matrices

Chain of LoRA: Efficient Fine-tuning of Language Models via Residual Learning