CURLoRA: Stable LLM Continual Fine-Tuning and Catastrophic Forgetting Mitigation

Muhammad Fawi

DOI: https://doi.org/10.5281/zenodo.12730055

2024-08-27

Abstract:This paper introduces CURLoRA, a novel approach to fine-tuning large language models (LLMs) that leverages CUR matrix decomposition in the context of Low-Rank Adaptation (LoRA). Our method addresses two critical challenges in LLM fine-tuning: mitigating catastrophic forgetting during continual learning and reducing the number of trainable parameters. We propose a unique modification to the CUR decomposition process, utilizing inverted probabilities for column and row selection which acts as an implicit regularization, and initializing the $U$ matrix as a zero matrix, and only fine-tuning it. We demonstrate through experiments on multiple datasets that CURLoRA outperforms standard LoRA in mitigating catastrophic forgetting. It maintains model stability and performance across tasks while significantly reducing the number of trainable parameters. Our results show that CURLoRA achieves very good and stable task accuracy while maintaining base model's perplexity scores fixed compared to LoRA upon continual fine-tuning, particularly in scenarios with limited data.

Machine Learning,Artificial Intelligence,Computation and Language

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper primarily introduces CURLoRA (a novel approach) that aims to improve the fine-tuning process of large language models (LLMs) by leveraging CUR matrix decomposition. It seeks to address the following two key issues: 1. **Catastrophic Forgetting**: - During continual learning, models often forget previously learned knowledge when fine-tuning on new tasks. CURLoRA effectively mitigates this issue by adopting a modified CUR decomposition method, using inverted probability to select columns and rows, and initializing the U matrix as a zero matrix. 2. **Reducing the Number of Trainable Parameters**: - Fine-tuning large language models typically requires substantial computational resources. CURLoRA enhances the efficiency of the fine-tuning process by reducing the number of parameters that need to be trained. ### Main Contributions 1. **Proposed a New CUR Decomposition Method**: - It uses inverted probability to select columns and rows and initializes the U matrix as a zero matrix. This method offers better stability and performance compared to traditional CUR decomposition. 2. **Theoretical Analysis**: - The paper provides a detailed analysis of how CURLoRA alleviates catastrophic forgetting by constraining the parameter space and implicit regularization. 3. **Experimental Evidence**: - Experiments conducted on multiple datasets and models demonstrate that CURLoRA outperforms standard LoRA in maintaining model stability and performance while significantly reducing the number of trainable parameters. In summary, CURLoRA offers a promising approach for the efficient fine-tuning of large language models, particularly excelling in scenarios with limited data.

CURLoRA: Stable LLM Continual Fine-Tuning and Catastrophic Forgetting Mitigation

ALLoRA: Adaptive Learning Rate Mitigates LoRA Fatal Flaws

CoRA: Optimizing Low-Rank Adaptation with Common Subspace of Large Language Models

Controlled Low-Rank Adaptation with Subspace Regularization for Continued Training on Large Language Models

OLoRA: Orthonormal Low-Rank Adaptation of Large Language Models

LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization

LoRA Learns Less and Forgets Less

LoRA-Mini : Adaptation Matrices Decomposition and Selective Training

LoRA$^2$ : Multi-Scale Low-Rank Approximations for Fine-Tuning Large Language Models

Riemannian Preconditioned LoRA for Fine-Tuning Foundation Models

LoRA ensembles for large language model fine-tuning

Learning Attentional Mixture of LoRAs for Language Model Continual Learning

MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning

Matrix-Transformation Based Low-Rank Adaptation (MTLoRA): A Brain-Inspired Method for Parameter-Efficient Fine-Tuning

PeriodicLoRA: Breaking the Low-Rank Bottleneck in LoRA Optimization

LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models Fine-tuning

LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters

Delta-LoRA: Fine-Tuning High-Rank Parameters with the Delta of Low-Rank Matrices

MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

Chain of LoRA: Efficient Fine-tuning of Language Models via Residual Learning

LaMDA: Large Model Fine-Tuning via Spectrally Decomposed Low-Dimensional Adaptation