Abstract:Low-Rank Adaptation (LoRA) is the bread and butter of Large Language Model (LLM) finetuning. LoRA learns an additive low-rank perturbation, $AB$, of a pretrained matrix parameter $W$ to align the model to a new task or dataset with $W+AB$. We identify three core limitations to LoRA for finetuning--a setting that employs limited amount of data and training steps. First, LoRA employs Dropout to prevent overfitting. We prove that Dropout is only suitable for long training episodes but fails to converge to a reliable regularizer for short training episodes. Second, LoRA's initialization of $B$ at $0$ creates a slow training dynamic between $A$ and $B$. That dynamic is also exacerbated by Dropout that further slows the escape from $0$ for $B$ which is particularly harmful for short training episodes. Third, the scaling factor multiplying each LoRA additive perturbation creates ``short-sighted'' interactions between the LoRA modules of different layers. Motivated by principled analysis of those limitations, we find an elegant solution: a Dropout-free, scaling-free, LoRA with Adaptive Learning rate--coined ALLoRA. By scaling the per sample and per parameter gradients with a coefficient inversely proportional to parameters' $\ell_2$ norm, ALLoRA alleviates those three limitations. As a by-product, ALLoRA removes two hyper-parameters from LoRA: the scaling factor and the dropout rate. Empirical results show that ALLoRA admits better accuracy than LoRA on various settings, including against recent LoRA variants such as Weight-Decomposed Low-Rank Adaptation (DoRA). Ablation studies show our solution is the optimal in a family of weight-dependent / output-dependent approaches on various LLMs including the latest Llama3.

SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation

Structure-Aware Low-Rank Adaptation for Parameter-Efficient Fine-Tuning

LoRA-SP: Streamlined Partial Parameter Adaptation for Resource-Efficient Fine-Tuning of Large Language Models

ALoRA: Allocating Low-Rank Adaptation for Fine-tuning Large Language Models

PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation

ALLoRA: Adaptive Learning Rate Mitigates LoRA Fatal Flaws

ResLoRA: Identity Residual Mapping in Low-Rank Adaption

A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA

ASLoRA: Adaptive Sharing Low-Rank Adaptation Across Layers

Sparse Low-rank Adaptation of Pre-trained Language Models

LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization

BA-LoRA: Bias-Alleviating Low-Rank Adaptation to Mitigate Catastrophic Inheritance in Large Language Models

LoRTA: Low Rank Tensor Adaptation of Large Language Models

PeriodicLoRA: Breaking the Low-Rank Bottleneck in LoRA Optimization

MELoRA: Mini-Ensemble Low-Rank Adapters for Parameter-Efficient Fine-Tuning

LoRA ensembles for large language model fine-tuning

Flat-LoRA: Low-Rank Adaption over a Flat Loss Landscape

Beyond fine-tuning: LoRA modules boost near-OOD detection and LLM security

SARA: Singular-Value Based Adaptive Low-Rank Adaption

ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation

LoRA-Pro: Are Low-Rank Adapters Properly Optimized?