Abstract:Fine-tuning large-scale pre-trained models is prohibitively expensive in terms of computational and memory costs. Low-Rank Adaptation (LoRA), a popular Parameter-Efficient Fine-Tuning (PEFT) method, provides an efficient way to fine-tune models by optimizing only a low-rank matrix. Despite recent progress made in improving LoRA's performance, the connection between the LoRA optimization space and the original full parameter space is often overlooked. A solution that appears flat in the LoRA space may exist sharp directions in the full parameter space, potentially harming generalization performance. In this paper, we propose Flat-LoRA, an efficient approach that seeks a low-rank adaptation located in a flat region of the full parameter space.Instead of relying on the well-established sharpness-aware minimization approach, which can incur significant computational and memory burdens, we utilize random weight perturbation with a Bayesian expectation loss objective to maintain training efficiency and design a refined perturbation generation strategy for improved performance. Experiments on natural language processing and image classification tasks with various architectures demonstrate the effectiveness of our approach.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of excessive computational and memory costs when fine - tuning large - scale pre - trained models. Specifically, the paper focuses on the Low - Rank Adaptation (LoRA) method. Although LoRA significantly reduces the number of parameters and training costs by only optimizing low - rank matrices, its performance optimization mainly focuses on the LoRA subspace and ignores the relationship with the original full - parameter space. **Key issues**: 1. **Disconnection between LoRA subspace and full - parameter space**: - Solutions that perform well in the LoRA subspace may be in sharp regions in the full - parameter space, which may damage the generalization performance. - For example, as shown in Figure 1, a flat minimum in the LoRA subspace (blue curve) may show a sharp direction in the full - parameter space (red curve), thus affecting the generation performance. 2. **Limitations of existing methods**: - Although previous works have attempted to improve LoRA performance by introducing more dedicated budgets, decomposing optimization directions and magnitude updates, and designing better initialization strategies, most of these methods only focus on optimization within the LoRA subspace. - The traditional Sharpness - Aware Minimization (SAM) method can effectively find flat minima, but it will significantly increase training time and memory overhead and is not suitable for fine - tuning large - scale models. ### Solutions: To solve the above problems, the paper proposes the **Flat - LoRA** method, and its main contributions include: 1. **Optimizing flatness in the full - parameter space**: - Flat - LoRA aims to optimize the loss landscape flatness in the full - parameter space where low - rank adaptation is located, to ensure that the combined weights are in flat regions, thereby improving generalization performance. 2. **Using Bayesian expected loss optimization**: - In order to maintain training efficiency and design an effective random weight perturbation generation strategy, Flat - LoRA uses the Bayesian expected loss objective function: \[ \min_{A,B} \mathbb{E}_{\epsilon \sim \mathcal{N}(0, \sigma^2 I)} L(W + s\cdot BA+\epsilon) \] - This method restores a flatter minimum by applying a smoothing filter in the full - parameter space and does not require additional gradient steps. 3. **Efficient random perturbation generation strategy**: - A new weight noise generation scheme is proposed, which takes into account the influence of the filtering structure and input dimensions, ensuring that the variance introduced during the forward propagation process is independent of the input dimension: \[ \epsilon \sim \mathcal{N}\left(0, \frac{\sigma^2}{n} \text{diag}(\|W'_1,\|\_2^2, \|W'_2,\|\_2^2, \cdots, \|W'_m,\|\_2^2)I_{m\times n}\right) \] 4. **Experimental verification**: - Extensive experiments have been carried out on natural language processing and computer vision tasks, and the results show that Flat - LoRA can achieve state - of - the - art performance under different architectures and can be easily integrated into existing methods to obtain consistent improvements. Through these innovations, Flat - LoRA not only improves the performance of LoRA but also solves the problem of poor generalization performance of existing methods in the full - parameter space.

Flat-LoRA: Low-Rank Adaption over a Flat Loss Landscape

LoRTA: Low Rank Tensor Adaptation of Large Language Models

PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation

LoRA-GA: Low-Rank Adaptation with Gradient Approximation

ALoRA: Allocating Low-Rank Adaptation for Fine-tuning Large Language Models

PeriodicLoRA: Breaking the Low-Rank Bottleneck in LoRA Optimization

Chain of LoRA: Efficient Fine-tuning of Language Models via Residual Learning

LoRA-Pro: Are Low-Rank Adapters Properly Optimized?

ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation

Structure-Aware Low-Rank Adaptation for Parameter-Efficient Fine-Tuning

GeoLoRA: Geometric integration for parameter efficient fine-tuning

LoRA Learns Less and Forgets Less

Bayesian-LoRA: LoRA based Parameter Efficient Fine-Tuning using Optimal Quantization levels and Rank Values trough Differentiable Bayesian Gates

LoRA$^2$ : Multi-Scale Low-Rank Approximations for Fine-Tuning Large Language Models

IncreLoRA: Incremental Parameter Allocation Method for Parameter-Efficient Fine-tuning

LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models Fine-tuning

The Expressive Power of Low-Rank Adaptation

LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters

AFLoRA: Adaptive Freezing of Low Rank Adaptation in Parameter Efficient Fine-Tuning of Large Models

Parameter-Efficient Fine-Tuning with Discrete Fourier Transform