Enhancing Parameter Efficiency and Generalization in Large-Scale Models: A Regularized and Masked Low-Rank Adaptation Approach

Yuzhu Mao,Siqi Ping,Zihao Zhao,Yang Liu,Wenbo Ding

2024-07-16

Abstract:Large pre-trained models, such as large language models (LLMs), present significant resource challenges for fine-tuning due to their extensive parameter sizes, especially for applications in mobile systems. To address this, Low-Rank Adaptation (LoRA) has been developed to reduce resource consumption while maintaining satisfactory fine-tuning results. Despite its effectiveness, the original LoRA method faces challenges of suboptimal performance and overfitting. This paper investigates the intrinsic dimension of the matrix updates approximated by the LoRA method and reveals the performance benefits of increasing this intrinsic dimension. By employing regularization and a gradient masking method that encourages higher intrinsic dimension, the proposed method, termed Regularized and Masked LoRA (RM-LoRA), achieves superior generalization performance with the same or lower trainable parameter budget compared to the original LoRA and its latest variants across various open-source vision and language datasets.

Machine Learning,Artificial Intelligence

What problem does this paper attempt to address?

The paper primarily addresses the resource challenges and efficiency issues faced during the fine-tuning of large-scale pre-trained models (such as large language models) and proposes an improved method to enhance parameter efficiency and generalization ability. Specifically, the paper tackles the following two core issues: 1. **Suboptimal Performance Issue**: Existing Low-Rank Adaptation (LoRA) methods, while effectively reducing the number of parameters during the fine-tuning process, may not achieve optimal performance in certain cases, especially when dealing with high-dimensional embeddings. 2. **Overfitting Issue**: Fine-tuning large-scale pre-trained models often leads to overfitting, which results in poor performance on test data. To address the above issues, the authors propose the Regularized and Masked LoRA (RM-LoRA) method. This method enhances the performance of LoRA through two key strategies: - **Regularization Technique**: Encourages the LoRA matrix to achieve a higher intrinsic rank within its parameter space, thereby helping to improve the intrinsic dimensions of LoRA updates. - **Gradient Masking Method**: Randomly masks a portion of the parameters for updates instead of updating all parameters, which helps control the number of parameters while promoting the growth of intrinsic rank. Experimental results show that compared to the original LoRA method and its variants, the RM-LoRA method achieves better generalization performance under the same or lower trainable parameter budget. These conclusions are drawn from empirical analyses on multiple open-source vision and language datasets.

Enhancing Parameter Efficiency and Generalization in Large-Scale Models: A Regularized and Masked Low-Rank Adaptation Approach

LoRA$^2$ : Multi-Scale Low-Rank Approximations for Fine-Tuning Large Language Models

Expressive and Generalizable Low-rank Adaptation for Large Models via Slow Cascaded Learning

Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs

SuperLoRA: Parameter-Efficient Unified Adaptation of Multi-Layer Attention Modules

Matrix-Transformation Based Low-Rank Adaptation (MTLoRA): A Brain-Inspired Method for Parameter-Efficient Fine-Tuning

Sparse Low-rank Adaptation of Pre-trained Language Models

Low-Rank Adaptation with Task-Relevant Feature Enhancement for Fine-tuning Language Models

Structure-Aware Low-Rank Adaptation for Parameter-Efficient Fine-Tuning

ASLoRA: Adaptive Sharing Low-Rank Adaptation Across Layers

LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters

HyperLoRA: Efficient Cross-task Generalization Via Constrained Low-Rank Adapters Generation

MLAE: Masked LoRA Experts for Parameter-Efficient Fine-Tuning.

LoRA-Mini : Adaptation Matrices Decomposition and Selective Training

LoRA-SP: Streamlined Partial Parameter Adaptation for Resource-Efficient Fine-Tuning of Large Language Models

A Survey on LoRA of Large Language Models

The Expressive Power of Low-Rank Adaptation

MLAE: Masked LoRA Experts for Visual Parameter-Efficient Fine-Tuning

Delta-LoRA: Fine-Tuning High-Rank Parameters with the Delta of Low-Rank Matrices

BiLoRA: A Bi-level Optimization Framework for Overfitting-Resilient Low-Rank Adaptation of Large Pre-trained Models