Abstract:The rapid advancement in large language models (LLMs) comes with a significant increase in their parameter size, presenting challenges for adaptation and fine-tuning. Parameter-efficient fine-tuning (PEFT) methods are widely used to adapt LLMs for downstream tasks efficiently. In this paper, we propose Singular Values and Orthonormal Regularized Singular Vectors Adaptation, or SORSA, a novel PEFT method. We introduce a method to analyze the variation of the parameters by performing singular value decomposition (SVD) and discuss and analyze SORSA's superiority in minimizing the alteration in the SVD aspect. Each SORSA adapter consists of two main parts: trainable principal singular weights $W_p = U_p \text{diag}(S_p) V^\top_p$, and frozen residual weights $W_r = U_r \text{diag}(S_r) V^\top_r$. These parts are initialized by performing SVD on pre-trained weights. Moreover, we implement and analyze an orthonormal regularizer, which we prove could decrease the condition number of $W_p$ and allows the optimization to be more efficient. SORSA adapters could be merged during inference, thus eliminating any inference latency. After all, SORSA shows a faster convergence than PiSSA and LoRA in our experiments. On the GSM-8K benchmark, Llama 2 7B adapted using SORSA achieved 56.03% accuracy, surpassing LoRA (42.30%), Full FT (49.05%), and PiSSA (53.07%). On the MATH benchmark, SORSA achieved 10.36% accuracy, outperforming LoRA (5.50%), Full FT (7.22%), and PiSSA (7.44%). We conclude that SORSA offers a new perspective on parameter-efficient fine-tuning, demonstrating remarkable performance. The code is available at <a class="link-external link-https" href="https://github.com/Gunale0926/SORSA" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The paper attempts to address the problem of how to efficiently adapt and fine-tune large-scale language models (LLMs) as the number of parameters rapidly increases. Specifically, the paper proposes a new parameter-efficient fine-tuning (PEFT) method called Singular Value and Orthogonal Regularization Singular Vector Adaptation (SORSA). This method aims to decompose the pre-trained weights into principal and residual components using Singular Value Decomposition (SVD) and only train the singular values and singular vectors of the principal components while freezing the residual parts. Additionally, the paper introduces an orthogonal regularizer to maintain the orthogonality of the singular vectors during training, ensuring efficient parameter updates and preserving the integrity of the singular values. The main contributions of the paper include: 1. **Proposing the SORSA method**: Decomposing pre-trained weights into principal and residual components using SVD, and only training the singular values and singular vectors of the principal components while freezing the residual parts. 2. **Introducing an orthogonal regularizer**: Ensuring the orthogonality of singular vectors during the training process, improving optimization efficiency. 3. **Experimental validation**: Demonstrating that SORSA exhibits faster convergence and higher accuracy compared to existing PEFT methods (such as LoRA and PiSSA) on multiple natural language processing tasks, including benchmarks like GSM-8K, MATH, and HumanEval. 4. **Theoretical analysis**: Providing mathematical analysis to demonstrate the optimization properties of SORSA, including the convexity of the regularizer and the Lipschitz continuity of the gradient, as well as the improvement of the condition number of the optimization problem by the regularizer. In summary, the paper presents the SORSA method, offering a new perspective for the efficient adaptation and fine-tuning of large-scale language models, and showcasing its significant advantages in terms of performance and efficiency.

SORSA: Singular Values and Orthonormal Regularized Singular Vectors Adaptation of Large Language Models

PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models

KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models

LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

SARA: Singular-Value Based Adaptive Low-Rank Adaption

SVFit: Parameter-Efficient Fine-Tuning of Large Pre-Trained Models Using Singular Values

LoRA-SP: Streamlined Partial Parameter Adaptation for Resource-Efficient Fine-Tuning of Large Language Models

OLoRA: Orthonormal Low-Rank Adaptation of Large Language Models

SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models

LoRA$^2$ : Multi-Scale Low-Rank Approximations for Fine-Tuning Large Language Models

LaMDA: Large Model Fine-Tuning via Spectrally Decomposed Low-Dimensional Adaptation

SBoRA: Low-Rank Adaptation with Regional Weight Updates

Parameter-Efficient Fine-Tuning of State Space Models

Expressive and Generalizable Low-rank Adaptation for Large Models via Slow Cascaded Learning

Sparsity-Accelerated Training for Large Language Models

LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning

LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization

LoRA ensembles for large language model fine-tuning

LoRA-GA: Low-Rank Adaptation with Gradient Approximation

RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation