Abstract:The rapid expansion of large language models (LLMs) has underscored the need for parameter-efficient fine-tuning methods, with LoRA (Low-Rank Adaptation) emerging as a popular solution. Although LoRA reduces the number of trainable parameters, serving multiple (task or user-specific) LoRA modules on top of a base model still creates significant storage challenges. To address this, using theoretical derivation, we introduce LoRA-XS (Low-Rank Adaptation with eXtremely Small number of parameters), a novel low-rank adaptation method that considerably reduces the trainable parameters while showing superior or competitive performance. LoRA-XS achieves this by inserting a small, trainable r x r weight matrix between frozen low-rank matrices, which are constructed by Singular Value Decomposition (SVD) of the original weight matrix. This lightweight matrix enables fine-tuning with drastically reduced storage requirements, making it feasible to deploy millions of personalized models while minimizing memory overhead. For instance, LoRA-XS achieves a remarkable reduction of trainable parameters by over 100x in 7B models compared to LoRA. Our evaluations across various benchmarks (including GLUE, GSM8K, MATH, and eight commonsense reasoning datasets) demonstrate that LoRA-XS performs competitively or better than LoRA and other recent methods like VeRA while being significantly more parameter efficient. We also provide an extensive ablation study on the importance of singular vectors in transformer weights, shedding light on the underlying mechanisms driving LoRA-XS's enhanced efficiency. These findings suggest that LoRA-XS is not only a storage-efficient alternative, but also a powerful tool for scaling and personalizing LLMs at unprecedented scales.

Tied-Lora: Enhancing parameter efficiency of LoRA with weight tying

LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters

Tensor Train Low-rank Approximation (TT-LoRA): Democratizing AI with Accelerated LLMs

LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization

LoRTA: Low Rank Tensor Adaptation of Large Language Models

LoRA-SP: Streamlined Partial Parameter Adaptation for Resource-Efficient Fine-Tuning of Large Language Models

PeriodicLoRA: Breaking the Low-Rank Bottleneck in LoRA Optimization

LoRA-Mini : Adaptation Matrices Decomposition and Selective Training

LoTR: Low Tensor Rank Weight Adaptation

LoRA Learns Less and Forgets Less

LoRA$^2$ : Multi-Scale Low-Rank Approximations for Fine-Tuning Large Language Models

$\textit{Trans-LoRA}$: towards data-free Transferable Parameter Efficient Finetuning

ALLoRA: Adaptive Learning Rate Mitigates LoRA Fatal Flaws

LoRA-drop: Efficient LoRA Parameter Pruning based on Output Evaluation

LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

Delta-LoRA: Fine-Tuning High-Rank Parameters with the Delta of Low-Rank Matrices

VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks

MultiLoRA: Democratizing LoRA for Better Multi-Task Learning

LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models Fine-tuning

Run LoRA Run: Faster and Lighter LoRA Implementations

Bayesian-LoRA: LoRA based Parameter Efficient Fine-Tuning using Optimal Quantization levels and Rank Values trough Differentiable Bayesian Gates