Forgetting before Learning: Utilizing Parametric Arithmetic for Knowledge Updating in Large Language Models

Shiwen Ni,Dingwei Chen,Chengming Li,Xiping Hu,Ruifeng Xu,Min Yang
DOI: https://doi.org/10.48550/arXiv.2311.08011
2024-02-16
Abstract:Recent advancements in Large Language Models (LLMs) have showcased their remarkable capabilities in text understanding and generation. However, even stronger LLMs are susceptible to acquiring erroneous or obsolete information from the training corpus. Direct secondary fine-tuning with data containing new knowledge may be ineffective in updating knowledge due to the conflict between old and new knowledge. In this paper, we propose a new paradigm for fine-tuning called F-Learning (Forgetting before Learning), which employs parametric arithmetic to facilitate the forgetting of old knowledge and learning of new knowledge. Experimental results on two publicly available datasets demonstrate that our proposed F-Learning can obviously improve the knowledge updating performance of both full fine-tuning and LoRA fine-tuning, simultaneously outperforming the existing baselines in most cases. Moreover, we have also discovered that forgetting old knowledge by subtracting the parameters of LoRA can yield a similar effect to subtracting the parameters of full fine-tuning, and occasionally even surpass it significantly.
Computation and Language
What problem does this paper attempt to address?
This paper attempts to solve the problem of knowledge update in large - language models (LLMs). Although LLMs perform excellently in text understanding and generation, they are prone to obtaining wrong or outdated information from the training corpus. Direct secondary fine - tuning using data containing new knowledge may be ineffective due to the conflict between old and new knowledge. Therefore, the paper proposes a new fine - tuning paradigm - F - Learning (Forget first, then Learn), which uses parameter arithmetic to promote the forgetting of old knowledge and the learning of new knowledge, thus effectively solving this problem. Specifically, the main contributions of the paper are as follows: 1. Propose a new fine - tuning paradigm "F - Learning (Forget first, then Learn)" for knowledge update in large - language models. 2. The experimental results show that the proposed F - Learning method improves the knowledge - update performance of various fine - tuning methods and outperforms existing baseline methods in most cases. 3. The experiment also finds that forgetting old knowledge by subtracting LoRA parameters can achieve a similar effect to subtracting full - fine - tuning parameters, and sometimes even significantly exceed the latter. The paper verifies the effectiveness of F - Learning through experiments on two public datasets, zsRE and COUNTER FACT, demonstrating its potential in improving knowledge - update performance. In addition, the paper also explores the influence of different forgetting rates on the forgetting effect of old knowledge and the specific influence of forgetting operations on model performance.