Abstract:The rapid expansion of large language models (LLMs) has underscored the need for parameter-efficient fine-tuning methods, with LoRA (Low-Rank Adaptation) emerging as a popular solution. Although LoRA reduces the number of trainable parameters, serving multiple (task or user-specific) LoRA modules on top of a base model still creates significant storage challenges. To address this, using theoretical derivation, we introduce LoRA-XS (Low-Rank Adaptation with eXtremely Small number of parameters), a novel low-rank adaptation method that considerably reduces the trainable parameters while showing superior or competitive performance. LoRA-XS achieves this by inserting a small, trainable r x r weight matrix between frozen low-rank matrices, which are constructed by Singular Value Decomposition (SVD) of the original weight matrix. This lightweight matrix enables fine-tuning with drastically reduced storage requirements, making it feasible to deploy millions of personalized models while minimizing memory overhead. For instance, LoRA-XS achieves a remarkable reduction of trainable parameters by over 100x in 7B models compared to LoRA. Our evaluations across various benchmarks (including GLUE, GSM8K, MATH, and eight commonsense reasoning datasets) demonstrate that LoRA-XS performs competitively or better than LoRA and other recent methods like VeRA while being significantly more parameter efficient. We also provide an extensive ablation study on the importance of singular vectors in transformer weights, shedding light on the underlying mechanisms driving LoRA-XS's enhanced efficiency. These findings suggest that LoRA-XS is not only a storage-efficient alternative, but also a powerful tool for scaling and personalizing LLMs at unprecedented scales.

MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards

ASLoRA: Adaptive Sharing Low-Rank Adaptation Across Layers

MoDE: Effective Multi-task Parameter Efficient Fine-Tuning with a Mixture of Dyadic Experts

MoR: Mixture of Ranks for Low-Rank Adaptation Tuning

LoRA-SP: Streamlined Partial Parameter Adaptation for Resource-Efficient Fine-Tuning of Large Language Models

MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning

Enhancing Scalability of Pre-trained Language Models Via Efficient Parameter Sharing.

ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation

GraphLoRA: Empowering LLMs Fine-Tuning via Graph Collaboration of MoE

MALoRA: Mixture of Asymmetric Low-Rank Adaptation for Enhanced Multi-Task Learning

LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters

SuperLoRA: Parameter-Efficient Unified Adaptation of Multi-Layer Attention Modules

Sparse Low-rank Adaptation of Pre-trained Language Models

Mixture-of-Subspaces in Low-Rank Adaptation

Full Parameter Fine-tuning for Large Language Models with Limited Resources

Scaling Pre-trained Language Models to Deeper Via Parameter-efficient Architecture

Higher Layers Need More LoRA Experts

Dropout Mixture Low-Rank Adaptation for Visual Parameters-Efficient Fine-Tuning

AlphaLoRA: Assigning LoRA Experts Based on Layer Training Quality

When MOE Meets LLMs: Parameter Efficient Fine-tuning for Multi-task Medical Applications

SLIM: Let LLM Learn More and Forget Less with Soft LoRA and Identity Mixture