Forgetting before Learning: Utilizing Parametric Arithmetic for Knowledge Updating in Large Language Models

Shiwen Ni,Dingwei Chen,Chengming Li,Xiping Hu,Ruifeng Xu,Min Yang

DOI: https://doi.org/10.48550/arXiv.2311.08011

2024-02-16

Abstract:Recent advancements in Large Language Models (LLMs) have showcased their remarkable capabilities in text understanding and generation. However, even stronger LLMs are susceptible to acquiring erroneous or obsolete information from the training corpus. Direct secondary fine-tuning with data containing new knowledge may be ineffective in updating knowledge due to the conflict between old and new knowledge. In this paper, we propose a new paradigm for fine-tuning called F-Learning (Forgetting before Learning), which employs parametric arithmetic to facilitate the forgetting of old knowledge and learning of new knowledge. Experimental results on two publicly available datasets demonstrate that our proposed F-Learning can obviously improve the knowledge updating performance of both full fine-tuning and LoRA fine-tuning, simultaneously outperforming the existing baselines in most cases. Moreover, we have also discovered that forgetting old knowledge by subtracting the parameters of LoRA can yield a similar effect to subtracting the parameters of full fine-tuning, and occasionally even surpass it significantly.

Computation and Language

What problem does this paper attempt to address?

This paper attempts to solve the problem of knowledge update in large - language models (LLMs). Although LLMs perform excellently in text understanding and generation, they are prone to obtaining wrong or outdated information from the training corpus. Direct secondary fine - tuning using data containing new knowledge may be ineffective due to the conflict between old and new knowledge. Therefore, the paper proposes a new fine - tuning paradigm - F - Learning (Forget first, then Learn), which uses parameter arithmetic to promote the forgetting of old knowledge and the learning of new knowledge, thus effectively solving this problem. Specifically, the main contributions of the paper are as follows: 1. Propose a new fine - tuning paradigm "F - Learning (Forget first, then Learn)" for knowledge update in large - language models. 2. The experimental results show that the proposed F - Learning method improves the knowledge - update performance of various fine - tuning methods and outperforms existing baseline methods in most cases. 3. The experiment also finds that forgetting old knowledge by subtracting LoRA parameters can achieve a similar effect to subtracting full - fine - tuning parameters, and sometimes even significantly exceed the latter. The paper verifies the effectiveness of F - Learning through experiments on two public datasets, zsRE and COUNTER FACT, demonstrating its potential in improving knowledge - update performance. In addition, the paper also explores the influence of different forgetting rates on the forgetting effect of old knowledge and the specific influence of forgetting operations on model performance.

Forgetting before Learning: Utilizing Parametric Arithmetic for Knowledge Updating in Large Language Models

Scaling Laws for Forgetting When Fine-Tuning Large Language Models

Refine Large Language Model Fine-tuning via Instruction Vector

An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning

Learning with Recoverable Forgetting

STAR: Constraint LoRA with Dynamic Active Learning for Data-Efficient Fine-Tuning of Large Language Models

To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models

Revisiting Catastrophic Forgetting in Large Language Model Tuning

Exploring Forgetting in Large Language Model Pre-Training

LLM Unlearning via Loss Adjustment with Only Forget Data

HFT: Half Fine-Tuning for Large Language Models

Can LLMs Learn New Concepts Incrementally without Forgetting?

KlF: Knowledge Localization and Fusion for Language Model Continual Learning

Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference

Analyzing and Reducing Catastrophic Forgetting in Parameter Efficient Tuning

Time Sensitive Knowledge Editing through Efficient Finetuning

ULMR: Unlearning Large Language Models Via Negative Response and Model Parameter Average

UNLEARN Efficient Removal of Knowledge in Large Language Models

Less-forgetting Multi-lingual Fine-tuning

Deconfounded Causality-aware Parameter-Efficient Fine-Tuning for Problem-Solving Improvement of LLMs

Is Parameter Collision Hindering Continual Learning in LLMs?