CURLoRA: Stable LLM Continual Fine-Tuning and Catastrophic Forgetting Mitigation

Muhammad Fawi
DOI: https://doi.org/10.5281/zenodo.12730055
2024-08-27
Abstract:This paper introduces CURLoRA, a novel approach to fine-tuning large language models (LLMs) that leverages CUR matrix decomposition in the context of Low-Rank Adaptation (LoRA). Our method addresses two critical challenges in LLM fine-tuning: mitigating catastrophic forgetting during continual learning and reducing the number of trainable parameters. We propose a unique modification to the CUR decomposition process, utilizing inverted probabilities for column and row selection which acts as an implicit regularization, and initializing the $U$ matrix as a zero matrix, and only fine-tuning it. We demonstrate through experiments on multiple datasets that CURLoRA outperforms standard LoRA in mitigating catastrophic forgetting. It maintains model stability and performance across tasks while significantly reducing the number of trainable parameters. Our results show that CURLoRA achieves very good and stable task accuracy while maintaining base model's perplexity scores fixed compared to LoRA upon continual fine-tuning, particularly in scenarios with limited data.
Machine Learning,Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper primarily introduces CURLoRA (a novel approach) that aims to improve the fine-tuning process of large language models (LLMs) by leveraging CUR matrix decomposition. It seeks to address the following two key issues: 1. **Catastrophic Forgetting**: - During continual learning, models often forget previously learned knowledge when fine-tuning on new tasks. CURLoRA effectively mitigates this issue by adopting a modified CUR decomposition method, using inverted probability to select columns and rows, and initializing the U matrix as a zero matrix. 2. **Reducing the Number of Trainable Parameters**: - Fine-tuning large language models typically requires substantial computational resources. CURLoRA enhances the efficiency of the fine-tuning process by reducing the number of parameters that need to be trained. ### Main Contributions 1. **Proposed a New CUR Decomposition Method**: - It uses inverted probability to select columns and rows and initializes the U matrix as a zero matrix. This method offers better stability and performance compared to traditional CUR decomposition. 2. **Theoretical Analysis**: - The paper provides a detailed analysis of how CURLoRA alleviates catastrophic forgetting by constraining the parameter space and implicit regularization. 3. **Experimental Evidence**: - Experiments conducted on multiple datasets and models demonstrate that CURLoRA outperforms standard LoRA in maintaining model stability and performance while significantly reducing the number of trainable parameters. In summary, CURLoRA offers a promising approach for the efficient fine-tuning of large language models, particularly excelling in scenarios with limited data.