SORSA: Singular Values and Orthonormal Regularized Singular Vectors Adaptation of Large Language Models

Yang Cao
2024-10-03
Abstract:The rapid advancement in large language models (LLMs) comes with a significant increase in their parameter size, presenting challenges for adaptation and fine-tuning. Parameter-efficient fine-tuning (PEFT) methods are widely used to adapt LLMs for downstream tasks efficiently. In this paper, we propose Singular Values and Orthonormal Regularized Singular Vectors Adaptation, or SORSA, a novel PEFT method. We introduce a method to analyze the variation of the parameters by performing singular value decomposition (SVD) and discuss and analyze SORSA's superiority in minimizing the alteration in the SVD aspect. Each SORSA adapter consists of two main parts: trainable principal singular weights $W_p = U_p \text{diag}(S_p) V^\top_p$, and frozen residual weights $W_r = U_r \text{diag}(S_r) V^\top_r$. These parts are initialized by performing SVD on pre-trained weights. Moreover, we implement and analyze an orthonormal regularizer, which we prove could decrease the condition number of $W_p$ and allows the optimization to be more efficient. SORSA adapters could be merged during inference, thus eliminating any inference latency. After all, SORSA shows a faster convergence than PiSSA and LoRA in our experiments. On the GSM-8K benchmark, Llama 2 7B adapted using SORSA achieved 56.03% accuracy, surpassing LoRA (42.30%), Full FT (49.05%), and PiSSA (53.07%). On the MATH benchmark, SORSA achieved 10.36% accuracy, outperforming LoRA (5.50%), Full FT (7.22%), and PiSSA (7.44%). We conclude that SORSA offers a new perspective on parameter-efficient fine-tuning, demonstrating remarkable performance. The code is available at <a class="link-external link-https" href="https://github.com/Gunale0926/SORSA" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Computation and Language
What problem does this paper attempt to address?
The paper attempts to address the problem of how to efficiently adapt and fine-tune large-scale language models (LLMs) as the number of parameters rapidly increases. Specifically, the paper proposes a new parameter-efficient fine-tuning (PEFT) method called Singular Value and Orthogonal Regularization Singular Vector Adaptation (SORSA). This method aims to decompose the pre-trained weights into principal and residual components using Singular Value Decomposition (SVD) and only train the singular values and singular vectors of the principal components while freezing the residual parts. Additionally, the paper introduces an orthogonal regularizer to maintain the orthogonality of the singular vectors during training, ensuring efficient parameter updates and preserving the integrity of the singular values. The main contributions of the paper include: 1. **Proposing the SORSA method**: Decomposing pre-trained weights into principal and residual components using SVD, and only training the singular values and singular vectors of the principal components while freezing the residual parts. 2. **Introducing an orthogonal regularizer**: Ensuring the orthogonality of singular vectors during the training process, improving optimization efficiency. 3. **Experimental validation**: Demonstrating that SORSA exhibits faster convergence and higher accuracy compared to existing PEFT methods (such as LoRA and PiSSA) on multiple natural language processing tasks, including benchmarks like GSM-8K, MATH, and HumanEval. 4. **Theoretical analysis**: Providing mathematical analysis to demonstrate the optimization properties of SORSA, including the convexity of the regularizer and the Lipschitz continuity of the gradient, as well as the improvement of the condition number of the optimization problem by the regularizer. In summary, the paper presents the SORSA method, offering a new perspective for the efficient adaptation and fine-tuning of large-scale language models, and showcasing its significant advantages in terms of performance and efficiency.