AFLoRA: Adaptive Freezing of Low Rank Adaptation in Parameter Efficient Fine-Tuning of Large Models

Zeyu Liu,Souvik Kundu,Anni Li,Junrui Wan,Lianghao Jiang,Peter Anthony Beerel
2024-04-17
Abstract:We present a novel Parameter-Efficient Fine-Tuning (PEFT) method, dubbed as Adaptive Freezing of Low Rank Adaptation (AFLoRA). Specifically, for each pre-trained frozen weight tensor, we add a parallel path of trainable low-rank matrices, namely a down-projection and an up-projection matrix, each of which is followed by a feature transformation vector. Based on a novel freezing score, we the incrementally freeze these projection matrices during fine-tuning to reduce the computation and alleviate over-fitting. Our experimental results demonstrate that we can achieve state-of-the-art performance with an average improvement of up to $0.85\%$ as evaluated on GLUE benchmark while yeilding up to $9.5\times$ fewer average trainable parameters. While compared in terms of runtime, AFLoRA can yield up to $1.86\times$ improvement as opposed to similar PEFT alternatives. Besides the practical utility of our approach, we provide insights on the trainability requirements of LoRA paths at different modules and the freezing schedule for the different projection matrices. Code will be released.
Computation and Language,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
This paper attempts to address the problem of how to maintain or improve model performance in parameter-efficient fine-tuning (PEFT) of large pre-trained models while reducing the number of trainable parameters and computational costs. Specifically, the paper proposes a new method—Adaptive Freezing Low-Rank Adaptation (AFLoRA), which aims to achieve this goal by dynamically freezing the projection matrices in the low-rank paths. ### Main Issues 1. **Parameter Efficiency**: Existing PEFT methods like LoRA and ELoRA reduce the number of trainable parameters but still incur certain computational overhead and may require a high rank to maintain performance. 2. **Overfitting Problem**: Reducing the number of trainable parameters helps alleviate overfitting, but effectively reducing parameters while maintaining performance is a challenge. 3. **Computational Efficiency**: How to further improve computational efficiency, reduce runtime, and computational load while reducing the number of parameters. ### Solution The paper proposes the AFLoRA method, with the main contributions as follows: 1. **Low-Rank Paths**: Adding a parallel low-rank path to each frozen weight tensor in the pre-trained model, including a down-projection matrix, an up-projection matrix, and a feature transformation vector. 2. **Adaptive Freezing**: Gradually freezing these projection matrices based on a novel freezing score to reduce computational load and alleviate overfitting. 3. **Performance Improvement**: Experimental results show that AFLoRA improves average performance by 0.85% on the GLUE benchmark, while reducing the number of trainable parameters by 9.5 times, improving runtime by 1.86 times, and reducing computational load by 2.96 times. ### Experimental Validation The paper conducts extensive experiments on multiple NLP benchmark datasets, comparing AFLoRA with existing methods such as LoRA and ELoRA, validating the effectiveness of AFLoRA. The experimental results show that AFLoRA not only outperforms or matches existing methods in terms of performance but also excels in parameter efficiency and computational efficiency. ### Conclusion AFLoRA successfully maintains or even improves model performance while reducing the number of trainable parameters and computational costs by adaptively freezing the projection matrices in the low-rank paths, providing a new solution for parameter-efficient fine-tuning of large-scale pre-trained models.