MEOW: MEMOry Supervised LLM Unlearning Via Inverted Facts

Tianle Gu,Kexin Huang,Ruilin Luo,Yuanqi Yao,Yujiu Yang,Yan Teng,Yingchun Wang

2024-09-18

Abstract:Large Language Models (LLMs) can memorize sensitive information, raising concerns about potential misuse. LLM Unlearning, a post-hoc approach to remove this information from trained LLMs, offers a promising solution to mitigate these risks. However, previous practices face three key challenges: 1. Utility: successful unlearning often causes catastrophic collapse on unrelated tasks. 2. Efficiency: many methods either involve adding similarly sized models, which slows down unlearning or inference, or require retain data that are difficult to obtain. 3. Robustness: even effective methods may still leak data via extraction techniques. To address these challenges, we propose MEOW, a simple yet effective gradient descent-based unlearning method. Specifically, we use an offline LLM to generate a set of inverted facts. Then, we design a new metric, MEMO, to quantify memorization in LLMs. Finally, based on the signals provided by MEMO, we select the most appropriate set of inverted facts and finetune the model based on them. We evaluate MEOW on the commonly used unlearn benchmark, ToFU, with Llama2-7B-Chat and Phi-1.5B, and test it on both NLU and NLG tasks. Results demonstrate significant improvement of MEOW in forget quality without substantial loss in model utility. Meanwhile, MEOW does not exhibit significant degradation in NLU or NLG capabilities, and there is even a slight improvement in NLU performance.

Computation and Language,Artificial Intelligence

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper aims to address the potential risks associated with large language models (LLMs) memorizing sensitive information. Specifically, the paper proposes a method called **MEOW** that achieves effective forgetting operations by generating inverted facts and designs a new metric called **MEMO** to quantify the memory capability of the model. #### Main Issues: 1. **Privacy Leakage**: LLMs may remember sensitive information from training data, leading to privacy breaches or copyright infringements. 2. **Challenges with Existing Methods**: - **Practicality**: Existing methods often cause a significant drop in model performance on other tasks while successfully forgetting certain information. - **Efficiency**: Many methods require adding models of equivalent size, which reduces forgetting or inference speed; or they need to retain hard-to-obtain datasets. - **Robustness**: Even effective methods may leak data through extraction techniques. #### Solutions: - **MEOW Method**: Achieves effective forgetting operations by generating inverted facts and using the MEMO metric to select appropriate inverted facts for fine-tuning the model. - **MEMO Metric**: Used to quantify the degree of memory in sequences within LLMs. Through these methods, the paper aims to improve the quality and robustness of forgetting while maintaining the practicality of the model. Experimental results show that MEOW performs well in multiple benchmark tests, significantly enhancing the quality of forgetting without substantially compromising the model's practicality.

MEOW: MEMOry Supervised LLM Unlearning Via Inverted Facts

Unlearn What You Want to Forget: Efficient Unlearning for LLMs

Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference

A Closer Look at Machine Unlearning for Large Language Models

Practical Unlearning for Large Language Models

Does Unlearning Truly Unlearn? A Black Box Evaluation of LLM Unlearning Methods

To Each (Textual Sequence) Its Own: Improving Memorized-Data Unlearning in Large Language Models

LLM Unlearning via Loss Adjustment with Only Forget Data

Towards Robust and Cost-Efficient Knowledge Unlearning for Large Language Models

ULMR: Unlearning Large Language Models Via Negative Response and Model Parameter Average

TOFU: A Task of Fictitious Unlearning for LLMs

Answer When Needed, Forget When Not: Language Models Pretend to Forget via In-Context Knowledge Unlearning

Rethinking Machine Unlearning for Large Language Models

Large Language Model Unlearning via Embedding-Corrupted Prompts

Towards Robust Evaluation of Unlearning in LLMs via Data Transformations

Machine Unlearning of Pre-trained Large Language Models

RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models

Evaluating Deep Unlearning in Large Language Models

UOE: Unlearning One Expert Is Enough For Mixture-of-experts LLMS

To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models