A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA

Damjan Kalajdzievski

2023-11-28

Abstract:As large language models (LLMs) have become increasingly compute and memory intensive, parameter-efficient fine-tuning (PEFT) methods are now a common strategy to fine-tune LLMs. A popular PEFT method is Low-Rank Adapters (LoRA), which adds trainable low-rank "adapters" to selected layers. Each adapter consists of a low-rank matrix product, multiplicatively scaled by a rank-dependent factor. This scaling factor, which divides adapters by a factor of the rank, results in slowed learning and stunted performance for LoRA with higher-rank adapters. Consequently, the use of LoRA in practice has generally been limited to very low ranks. In this work, we study the impact of the scaling factor on the learning process and prove that LoRA adapters should be divided by a factor of the square root of the rank. Modifying LoRA with the appropriate scaling factor, which we call the rank-stabilized LoRA (rsLoRA) method, easily provides for a fine-tuning compute/performance trade-off, where larger ranks can be used to trade off increased computational resources during training for better fine-tuning performance, with no change in inference computing cost.

Computation and Language,Machine Learning

What problem does this paper attempt to address?

The paper primarily addresses an issue encountered when using LoRA (Low-Rank Adapters) for fine-tuning large language models (LLMs), specifically that as the rank of the LoRA adapter increases, the learning process becomes unstable, leading to limited performance improvements. Specifically, the paper points out that in the LoRA method, low-rank adapters are multiplied by a scaling factor that depends on the rank. In traditional LoRA implementations, this factor is the inverse of the rank (\( \gamma_r = \alpha / r \)), which causes the gradient to gradually diminish as the adapter rank increases (a phenomenon known as "gradient collapse"), thereby preventing higher-rank adapters from fully utilizing their additional parameters to enhance performance. Consequently, in practice, LoRA is typically restricted to very low ranks. To address this issue, the paper proposes a new method—rank-stabilized LoRA (rsLoRA), where the scaling factor of the adapter is modified to be the inverse of the square root of the rank (\( \gamma_r = \alpha / \sqrt{r} \)). Through theoretical analysis and experimental validation, the authors demonstrate that this new scaling factor can stabilize the learning process, maintaining good performance even for higher ranks. This means that users can choose higher ranks based on available computational resources, thereby achieving a better trade-off between training cost and performance without altering the inference cost. In summary, the goal of the paper is to improve the LoRA method so that stable fine-tuning performance can be achieved even with higher ranks, thereby enhancing the adaptability and efficiency of the model.

A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA

LoRTA: Low Rank Tensor Adaptation of Large Language Models

PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation

Riemannian Preconditioned LoRA for Fine-Tuning Foundation Models

ALoRA: Allocating Low-Rank Adaptation for Fine-tuning Large Language Models

PeriodicLoRA: Breaking the Low-Rank Bottleneck in LoRA Optimization

LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

LoRA+: Efficient Low Rank Adaptation of Large Models

LoRA-Pro: Are Low-Rank Adapters Properly Optimized?

MoR: Mixture of Ranks for Low-Rank Adaptation Tuning

LoRA Learns Less and Forgets Less

LoRA-Mini : Adaptation Matrices Decomposition and Selective Training

GeLoRA: Geometric Adaptive Ranks For Efficient LoRA Fine-tuning

LoRA ensembles for large language model fine-tuning

FanLoRA: Fantastic LoRAs and Where to Find Them in Large Language Model Fine-tuning

IncreLoRA: Incremental Parameter Allocation Method for Parameter-Efficient Fine-tuning

LoRA-GA: Low-Rank Adaptation with Gradient Approximation

MELoRA: Mini-Ensemble Low-Rank Adapters for Parameter-Efficient Fine-Tuning

Randomized Asymmetric Chain of LoRA: The First Meaningful Theoretical Framework for Low-Rank Adaptation

LoRA vs Full Fine-tuning: An Illusion of Equivalence

Flat-LoRA: Low-Rank Adaption over a Flat Loss Landscape