Abstract:With the proliferation of large pre-trained language models (PLMs), fine-tuning all model parameters becomes increasingly inefficient, particularly when dealing with numerous downstream tasks that entail substantial training and storage costs. Several approaches aimed at achieving parameter-efficient fine-tuning (PEFT) have been proposed. Among them, Low-Rank Adaptation (LoRA) stands out as an archetypal method, incorporating trainable rank decomposition matrices into each target module. Nevertheless, LoRA does not consider the varying importance of each layer. To address these challenges, we introduce PRILoRA, which linearly allocates a different rank for each layer, in an increasing manner, and performs pruning throughout the training process, considering both the temporary magnitude of weights and the accumulated statistics of the input to any given layer. We validate the effectiveness of PRILoRA through extensive experiments on eight GLUE benchmarks, setting a new state of the art.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the efficiency and resource consumption problems encountered when fine - tuning large - scale pre - trained language models (PLMs). Specifically: 1. **Inefficiency of full - parameter fine - tuning**: - As the scale of pre - trained models continues to expand, such as Llama, Palm, etc., these models contain hundreds of millions or even more parameters. It becomes very inefficient to comprehensively fine - tune all parameters, especially when dealing with multiple downstream tasks, which will lead to significant training and storage costs. 2. **Limitations of existing methods**: - Existing parameter - efficient fine - tuning (PEFT) methods, such as LoRA (Low - Rank Adaptation), although significantly reduce the number of trainable parameters by introducing low - rank matrix decomposition, do not take into account the differences in importance between different layers. LoRA uses a fixed low - rank value in all layers, which may lead to unreasonable resource allocation. 3. **Improving the effect and efficiency of fine - tuning**: - In order to further improve the effect and efficiency of fine - tuning, the paper proposes PRILoRA (Pruned and Rank - Increasing Low - Rank Adaptation), which more reasonably allocates resources and improves performance by linearly increasing the low - rank values of each layer and pruning during the training process. ### Main contributions of PRILoRA 1. **Linearly increasing low - rank distribution**: - PRILoRA assigns different low - rank values to each layer in a linearly increasing manner, ensuring that higher layers (close to the output layer) obtain more adaptation resources. This conforms to the characteristic that higher layers in the Transformer model need more adjustment. 2. **Importance - based continuous pruning**: - During the training process, PRILoRA prunes the low - rank matrix A according to the importance of input activation and weights. This not only reduces unnecessary parameters but also improves the overall accuracy of the model. ### Experimental results - **Benchmark tests**: PRILoRA has achieved new best results in eight GLUE benchmark tests, proving its effectiveness and superiority. - **Ablation experiments**: By comparing the performance under different configurations, the effectiveness of the linearly increasing low - rank distribution and importance - based pruning methods is verified. - **Training costs**: PRILoRA hardly increases the training time and GPU memory consumption while maintaining the same number of trainable parameters as LoRA. In summary, this paper solves the resource allocation and efficiency problems in the fine - tuning process of large - scale pre - trained language models by introducing the PRILoRA method, and improves the effect and efficiency of fine - tuning.

PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation

IncreLoRA: Incremental Parameter Allocation Method for Parameter-Efficient Fine-tuning

ALoRA: Allocating Low-Rank Adaptation for Fine-tuning Large Language Models

LoRTA: Low Rank Tensor Adaptation of Large Language Models

Structure-Aware Low-Rank Adaptation for Parameter-Efficient Fine-Tuning

Flat-LoRA: Low-Rank Adaption over a Flat Loss Landscape

LoRA Learns Less and Forgets Less

LoRA: Low-Rank Adaptation of Large Language Models

Lottery Rank-Pruning Adaptation Parameter Efficient Fine-Tuning

LoRA-GA: Low-Rank Adaptation with Gradient Approximation

LoRA-drop: Efficient LoRA Parameter Pruning based on Output Evaluation

LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters

LoRA-Pro: Are Low-Rank Adapters Properly Optimized?

LoRA-Mini : Adaptation Matrices Decomposition and Selective Training

DoRA: Enhancing Parameter-Efficient Fine-Tuning with Dynamic Rank Distribution

Sparse Low-rank Adaptation of Pre-trained Language Models

RoseLoRA: Row and Column-wise Sparse Low-rank Adaptation of Pre-trained Language Model for Knowledge Editing and Fine-tuning

ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation

PeriodicLoRA: Breaking the Low-Rank Bottleneck in LoRA Optimization

Riemannian Preconditioned LoRA for Fine-Tuning Foundation Models

Bayesian-LoRA: LoRA based Parameter Efficient Fine-Tuning using Optimal Quantization levels and Rank Values trough Differentiable Bayesian Gates