LLM Surgery: Efficient Knowledge Unlearning and Editing in Large Language Models

Akshaj Kumar Veldanda,Shi-Xiong Zhang,Anirban Das,Supriyo Chakraborty,Stephen Rawls,Sambit Sahu,Milind Naphade
2024-09-20
Abstract:Large language models (LLMs) have revolutionized various domains, yet their utility comes with significant challenges related to outdated or problematic knowledge embedded during pretraining. This paper addresses the challenge of modifying LLMs to unlearn problematic and outdated information while efficiently integrating new knowledge without retraining from scratch. Here, we propose LLM Surgery, a framework to efficiently modify LLM behaviour by optimizing a three component objective function that: (1) Performs reverse gradient on unlearning dataset (problematic and outdated information), (2) Performs gradient descent on the update dataset (new and updated information), and (3) Minimizes the KL divergence on the retain dataset (small subset of unchanged text), ensuring alignment between pretrained and modified model outputs. Due to the lack of publicly available datasets specifically tailored for our novel task, we compiled a new dataset and an evaluation benchmark. Using Llama2-7B, we demonstrate that LLM Surgery can achieve significant forgetting on the unlearn set, a 20\% increase in accuracy on the update set, and maintain performance on the retain set.
Computation and Language,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The paper aims to address issues present in large language models (LLMs), specifically including: 1. **Removing outdated or problematic knowledge**: Due to the absorption of a large amount of internet data during the pre-training process, LLMs may contain outdated information, copyrighted material, and personal privacy information, leading the model to potentially generate inappropriate content. 2. **Updating new knowledge**: Existing LLMs are limited by the time range of their training data and cannot timely acquire the latest information, which may result in the generation of inaccurate or outdated content. 3. **Maintaining original performance**: While making the above modifications, it is necessary to ensure that the model's performance on standard benchmark tasks does not degrade. The paper proposes a framework called “LLM Surgery” to achieve the above goals by optimizing an objective function that includes three parts: - Performing reverse gradients on the data set that needs to be forgotten (i.e., removing specific information); - Applying gradient descent on the updated data set (i.e., integrating new knowledge); - Minimizing the KL divergence on the retained data set (i.e., ensuring consistency of outputs before and after modification). Experimental results show that using this method can significantly reduce the memory of content that needs to be forgotten and improve the ability to learn new knowledge without retraining the model from scratch, while maintaining stable performance on other tasks. Additionally, compared to traditional methods, this framework is more efficient and can significantly reduce GPU computation time.