Abstract:Fine-tuning large language models (LLMs) with high parameter efficiency for downstream tasks has become a new paradigm. Low-Rank Adaptation (LoRA) significantly reduces the number of trainable parameters for fine-tuning. Although it has demonstrated commendable performance, updating parameters within a single scale may not be the optimal choice for complex downstream <a class="link-external link-http" href="http://tasks.In" rel="external noopener nofollow">this http URL</a> this paper, we extend the LoRA to multiple scales, dubbed as LoRA$^2$. We first combine orthogonal projection theory to train a set of LoRAs in two mutually orthogonal planes. Then, we improve the importance score algorithm, which reduce parameter sensitivity score calculations by approximately 98.5\%. By pruning singular values with lower importance scores, thereby enhancing adaptability to various downstream tasks. Extensive experiments are conducted on two widely used pre-trained models to validate the effectiveness of LoRA$^2$. Results show that it significantly reduces the number of trainable parameters to just 0.72\% compared to full fine-tuning, while still delivering highly impressive performance. Even when the parameters are further reduced to 0.17M, it still achieves comparable results to the baseline with 8 times more parameters. Our code is available here: https://anonymous.4open.science/r/LoRA-2-5B4C

What problem does this paper attempt to address?

The paper primarily addresses the issue of efficiently fine-tuning large pre-trained language models for downstream tasks. Specifically, the paper proposes LoRA2 (Low-Rank Adaptation 2), a multi-scale low-rank approximation method designed to improve existing techniques through the following ways: 1. **Multi-Scale Orthogonal Low-Rank Approximation**: LoRA2 combines orthogonal projection theory, training multiple sets of LoRA (Low-Rank Adaptation) on two mutually orthogonal planes, and ensures orthogonality between these LoRAs through dual regularization, thereby expanding the model's learning space. 2. **Improvement of Importance Score Algorithm**: The paper improves the importance score algorithm to fit the structure of LoRA2 and reduces the computation of parameter sensitivity scores by approximately 98.5% without affecting performance. This allows LoRA2 to dynamically allocate parameter budgets to adapt to different downstream tasks. 3. **Enhancement of Parameter Efficiency**: Compared to fully fine-tuning models, LoRA2 significantly reduces the number of parameters that need to be trained, accounting for only 0.72%. Even when further reduced to 0.17M parameters, its performance remains comparable to the baseline, while the baseline model's parameter count is 8 times that of LoRA2. The experimental section validates the effectiveness of LoRA2, particularly in natural language understanding tasks, such as multiple tasks in the GLUE benchmark. LoRA2 outperforms existing parameter-efficient fine-tuning methods on most tasks and performs well across different model scales, including DeBERTaV3-base and RoBERTa-large. In summary, the main goal of LoRA2 is to improve parameter efficiency during the fine-tuning process through multi-scale low-rank approximation and an improved importance score algorithm, thereby achieving effective adaptation to downstream tasks.

LoRA$^2$ : Multi-Scale Low-Rank Approximations for Fine-Tuning Large Language Models

Chain of LoRA: Efficient Fine-tuning of Language Models via Residual Learning

LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters

CoRA: Optimizing Low-Rank Adaptation with Common Subspace of Large Language Models

ASLoRA: Adaptive Sharing Low-Rank Adaptation Across Layers

LoRA-SP: Streamlined Partial Parameter Adaptation for Resource-Efficient Fine-Tuning of Large Language Models

Low-Rank Adaptation with Task-Relevant Feature Enhancement for Fine-tuning Language Models

LoRA-Mini : Adaptation Matrices Decomposition and Selective Training

Delta-LoRA: Fine-Tuning High-Rank Parameters with the Delta of Low-Rank Matrices

Matrix-Transformation Based Low-Rank Adaptation (MTLoRA): A Brain-Inspired Method for Parameter-Efficient Fine-Tuning

Sparse Low-rank Adaptation of Pre-trained Language Models

LoRTA: Low Rank Tensor Adaptation of Large Language Models

OLoRA: Orthonormal Low-Rank Adaptation of Large Language Models

Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs

Enhancing Parameter Efficiency and Generalization in Large-Scale Models: A Regularized and Masked Low-Rank Adaptation Approach

LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models Fine-tuning

ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation

LoRA-Pro: Are Low-Rank Adapters Properly Optimized?

AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning

ALoRA: Allocating Low-Rank Adaptation for Fine-tuning Large Language Models

LoRA ensembles for large language model fine-tuning