Enhancing Parameter Efficiency and Generalization in Large-Scale Models: A Regularized and Masked Low-Rank Adaptation Approach

Yuzhu Mao,Siqi Ping,Zihao Zhao,Yang Liu,Wenbo Ding
2024-07-16
Abstract:Large pre-trained models, such as large language models (LLMs), present significant resource challenges for fine-tuning due to their extensive parameter sizes, especially for applications in mobile systems. To address this, Low-Rank Adaptation (LoRA) has been developed to reduce resource consumption while maintaining satisfactory fine-tuning results. Despite its effectiveness, the original LoRA method faces challenges of suboptimal performance and overfitting. This paper investigates the intrinsic dimension of the matrix updates approximated by the LoRA method and reveals the performance benefits of increasing this intrinsic dimension. By employing regularization and a gradient masking method that encourages higher intrinsic dimension, the proposed method, termed Regularized and Masked LoRA (RM-LoRA), achieves superior generalization performance with the same or lower trainable parameter budget compared to the original LoRA and its latest variants across various open-source vision and language datasets.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The paper primarily addresses the resource challenges and efficiency issues faced during the fine-tuning of large-scale pre-trained models (such as large language models) and proposes an improved method to enhance parameter efficiency and generalization ability. Specifically, the paper tackles the following two core issues: 1. **Suboptimal Performance Issue**: Existing Low-Rank Adaptation (LoRA) methods, while effectively reducing the number of parameters during the fine-tuning process, may not achieve optimal performance in certain cases, especially when dealing with high-dimensional embeddings. 2. **Overfitting Issue**: Fine-tuning large-scale pre-trained models often leads to overfitting, which results in poor performance on test data. To address the above issues, the authors propose the Regularized and Masked LoRA (RM-LoRA) method. This method enhances the performance of LoRA through two key strategies: - **Regularization Technique**: Encourages the LoRA matrix to achieve a higher intrinsic rank within its parameter space, thereby helping to improve the intrinsic dimensions of LoRA updates. - **Gradient Masking Method**: Randomly masks a portion of the parameters for updates instead of updating all parameters, which helps control the number of parameters while promoting the growth of intrinsic rank. Experimental results show that compared to the original LoRA method and its variants, the RM-LoRA method achieves better generalization performance under the same or lower trainable parameter budget. These conclusions are drawn from empirical analyses on multiple open-source vision and language datasets.