PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation

Nadav Benedek,Lior Wolf
2024-01-21
Abstract:With the proliferation of large pre-trained language models (PLMs), fine-tuning all model parameters becomes increasingly inefficient, particularly when dealing with numerous downstream tasks that entail substantial training and storage costs. Several approaches aimed at achieving parameter-efficient fine-tuning (PEFT) have been proposed. Among them, Low-Rank Adaptation (LoRA) stands out as an archetypal method, incorporating trainable rank decomposition matrices into each target module. Nevertheless, LoRA does not consider the varying importance of each layer. To address these challenges, we introduce PRILoRA, which linearly allocates a different rank for each layer, in an increasing manner, and performs pruning throughout the training process, considering both the temporary magnitude of weights and the accumulated statistics of the input to any given layer. We validate the effectiveness of PRILoRA through extensive experiments on eight GLUE benchmarks, setting a new state of the art.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the efficiency and resource consumption problems encountered when fine - tuning large - scale pre - trained language models (PLMs). Specifically: 1. **Inefficiency of full - parameter fine - tuning**: - As the scale of pre - trained models continues to expand, such as Llama, Palm, etc., these models contain hundreds of millions or even more parameters. It becomes very inefficient to comprehensively fine - tune all parameters, especially when dealing with multiple downstream tasks, which will lead to significant training and storage costs. 2. **Limitations of existing methods**: - Existing parameter - efficient fine - tuning (PEFT) methods, such as LoRA (Low - Rank Adaptation), although significantly reduce the number of trainable parameters by introducing low - rank matrix decomposition, do not take into account the differences in importance between different layers. LoRA uses a fixed low - rank value in all layers, which may lead to unreasonable resource allocation. 3. **Improving the effect and efficiency of fine - tuning**: - In order to further improve the effect and efficiency of fine - tuning, the paper proposes PRILoRA (Pruned and Rank - Increasing Low - Rank Adaptation), which more reasonably allocates resources and improves performance by linearly increasing the low - rank values of each layer and pruning during the training process. ### Main contributions of PRILoRA 1. **Linearly increasing low - rank distribution**: - PRILoRA assigns different low - rank values to each layer in a linearly increasing manner, ensuring that higher layers (close to the output layer) obtain more adaptation resources. This conforms to the characteristic that higher layers in the Transformer model need more adjustment. 2. **Importance - based continuous pruning**: - During the training process, PRILoRA prunes the low - rank matrix A according to the importance of input activation and weights. This not only reduces unnecessary parameters but also improves the overall accuracy of the model. ### Experimental results - **Benchmark tests**: PRILoRA has achieved new best results in eight GLUE benchmark tests, proving its effectiveness and superiority. - **Ablation experiments**: By comparing the performance under different configurations, the effectiveness of the linearly increasing low - rank distribution and importance - based pruning methods is verified. - **Training costs**: PRILoRA hardly increases the training time and GPU memory consumption while maintaining the same number of trainable parameters as LoRA. In summary, this paper solves the resource allocation and efficiency problems in the fine - tuning process of large - scale pre - trained language models by introducing the PRILoRA method, and improves the effect and efficiency of fine - tuning.