LLM Surgery: Efficient Knowledge Unlearning and Editing in Large Language Models

Akshaj Kumar Veldanda,Shi-Xiong Zhang,Anirban Das,Supriyo Chakraborty,Stephen Rawls,Sambit Sahu,Milind Naphade

2024-09-20

Abstract:Large language models (LLMs) have revolutionized various domains, yet their utility comes with significant challenges related to outdated or problematic knowledge embedded during pretraining. This paper addresses the challenge of modifying LLMs to unlearn problematic and outdated information while efficiently integrating new knowledge without retraining from scratch. Here, we propose LLM Surgery, a framework to efficiently modify LLM behaviour by optimizing a three component objective function that: (1) Performs reverse gradient on unlearning dataset (problematic and outdated information), (2) Performs gradient descent on the update dataset (new and updated information), and (3) Minimizes the KL divergence on the retain dataset (small subset of unchanged text), ensuring alignment between pretrained and modified model outputs. Due to the lack of publicly available datasets specifically tailored for our novel task, we compiled a new dataset and an evaluation benchmark. Using Llama2-7B, we demonstrate that LLM Surgery can achieve significant forgetting on the unlearn set, a 20\% increase in accuracy on the update set, and maintain performance on the retain set.

Computation and Language,Artificial Intelligence,Machine Learning

What problem does this paper attempt to address?

The paper aims to address issues present in large language models (LLMs), specifically including: 1. **Removing outdated or problematic knowledge**: Due to the absorption of a large amount of internet data during the pre-training process, LLMs may contain outdated information, copyrighted material, and personal privacy information, leading the model to potentially generate inappropriate content. 2. **Updating new knowledge**: Existing LLMs are limited by the time range of their training data and cannot timely acquire the latest information, which may result in the generation of inaccurate or outdated content. 3. **Maintaining original performance**: While making the above modifications, it is necessary to ensure that the model's performance on standard benchmark tasks does not degrade. The paper proposes a framework called “LLM Surgery” to achieve the above goals by optimizing an objective function that includes three parts: - Performing reverse gradients on the data set that needs to be forgotten (i.e., removing specific information); - Applying gradient descent on the updated data set (i.e., integrating new knowledge); - Minimizing the KL divergence on the retained data set (i.e., ensuring consistency of outputs before and after modification). Experimental results show that using this method can significantly reduce the memory of content that needs to be forgotten and improve the ability to learn new knowledge without retraining the model from scratch, while maintaining stable performance on other tasks. Additionally, compared to traditional methods, this framework is more efficient and can significantly reduce GPU computation time.

LLM Surgery: Efficient Knowledge Unlearning and Editing in Large Language Models

Machine Unlearning in Large Language Models

Towards Safer Large Language Models through Machine Unlearning

Towards Robust and Cost-Efficient Knowledge Unlearning for Large Language Models

Does Unlearning Truly Unlearn? A Black Box Evaluation of LLM Unlearning Methods

UNLEARN Efficient Removal of Knowledge in Large Language Models

A Closer Look at Machine Unlearning for Large Language Models

Offset Unlearning for Large Language Models

Knowledge Editing for Large Language Models: A Survey

Rethinking Machine Unlearning for Large Language Models

To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models

LLM Unlearning via Loss Adjustment with Only Forget Data

Multi-Objective Large Language Model Unlearning

Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference

ULMR: Unlearning Large Language Models Via Negative Response and Model Parameter Average

Split, Unlearn, Merge: Leveraging Data Attributes for More Effective Unlearning in LLMs

Machine Unlearning of Pre-trained Large Language Models

Editing Factual Knowledge and Explanatory Ability of Medical Large Language Models