Abstract:In the training of large language models, parameter-efficient techniques such as LoRA optimize memory usage and reduce communication overhead during the fine-tuning phase. However, applying such techniques directly during the pre-training phase results in poor performance, primarily because the premature implementation of low-rank training significantly reduces model accuracy. Existing methods like ReLoRA and GaLore have attempted to address this challenge by updating the low-rank subspace. However, they still fall short of achieving the accuracy of full-rank training because they must limit the update frequency to maintain optimizer state consistency, hindering their ability to closely approximate full-rank training behavior. In this paper, we introduce SwitchLoRA, a parameter-efficient training technique that frequently and smoothly replaces the trainable parameters of LoRA adapters with alternative parameters. SwitchLoRA updates the low-rank subspace incrementally, targeting only a few dimensions at a time to minimize the impact on optimizer states. This allows a higher update frequency, thereby enhancing accuracy by enabling the updated parameters to more closely mimic full-rank behavior during the pre-training phase. Our results demonstrate that SwitchLoRA actually surpasses full-rank training, reducing perplexity from 15.23 to 15.01 on the LLaMA 1.3B model while reducing communication overhead by 54\% on the LLaMA 1.3B model. Furthermore, after full fine-tuning the SwitchLoRA pre-trained model and the full-rank pre-trained model on the GLUE benchmark, the SwitchLoRA pre-trained model showed an average accuracy gain of about 1\% over the full-rank pre-trained model. This demonstrates enhanced generalization and reasoning capabilities of SwitchLoRA.

Unlocking the Global Synergies in Low-Rank Adapters

LoRA-Pro: Are Low-Rank Adapters Properly Optimized?

HyperLoRA: Efficient Cross-task Generalization Via Constrained Low-Rank Adapters Generation

LoRA+: Efficient Low Rank Adaptation of Large Models

LoRA$^2$ : Multi-Scale Low-Rank Approximations for Fine-Tuning Large Language Models

Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs

ALoRA: Allocating Low-Rank Adaptation for Fine-tuning Large Language Models

LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters

CoRA: Optimizing Low-Rank Adaptation with Common Subspace of Large Language Models

SwitchLoRA: Switched Low-Rank Adaptation Can Learn Full-Rank Information

MultiLoRA: Democratizing LoRA for Better Multi-Task Learning

LoRA-SP: Streamlined Partial Parameter Adaptation for Resource-Efficient Fine-Tuning of Large Language Models

Sparse Low-rank Adaptation of Pre-trained Language Models

Towards Robust and Efficient Federated Low-Rank Adaptation with Heterogeneous Clients

ResLoRA: Identity Residual Mapping in Low-Rank Adaption

GeLoRA: Geometric Adaptive Ranks For Efficient LoRA Fine-tuning

LoRA Learns Less and Forgets Less

A Note on LoRA

LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization

The Expressive Power of Low-Rank Adaptation

LoRTA: Low Rank Tensor Adaptation of Large Language Models