Abstract:As Large Language Models (LLMs) demonstrate extensive capability in learning from documents, LLM unlearning becomes an increasingly important research area to address concerns of LLMs in terms of privacy, copyright, etc. A conventional LLM unlearning task typically involves two goals: (1) The target LLM should forget the knowledge in the specified forget documents, and (2) it should retain the other knowledge that the LLM possesses, for which we assume access to a small number of retain documents. To achieve both goals, a mainstream class of LLM unlearning methods introduces an optimization framework with a combination of two objectives - maximizing the prediction loss on the forget documents while minimizing that on the retain documents, which suffers from two challenges, degenerated output and catastrophic forgetting. In this paper, we propose a novel unlearning framework called Unlearning from Logit Difference (ULD), which introduces an assistant LLM that aims to achieve the opposite of the unlearning goals: remembering the forget documents and forgetting the retain knowledge. ULD then derives the unlearned LLM by computing the logit difference between the target and the assistant LLMs. We show that such reversed objectives would naturally resolve both aforementioned challenges while significantly improving the training efficiency. Extensive experiments demonstrate that our method efficiently achieves the intended forgetting while preserving the LLM's overall capabilities, reducing training time by more than threefold. Notably, our method loses 0% of model utility on the ToFU benchmark, whereas baseline methods may sacrifice 17% of utility on average to achieve comparable forget quality. Our code will be publicly available at <a class="link-external link-https" href="https://github.com/UCSB-NLP-Chang/ULD" rel="external noopener nofollow">this https URL</a>.

Demystifying Language Model Forgetting with Low-rank Example Associations

What Will My Model Forget? Forecasting Forgotten Examples in Language Model Refinement

Exploring Forgetting in Large Language Model Pre-Training

An Empirical Analysis of Forgetting in Pre-trained Models with Incremental Low-Rank Updates

Answer When Needed, Forget When Not: Language Models Pretend to Forget via In-Context Knowledge Unlearning

Scaling Laws for Forgetting When Fine-Tuning Large Language Models

Unforgettable Generalization in Language Models

Can LLMs Learn New Concepts Incrementally without Forgetting?

Chained Tuning Leads to Biased Forgetting

LLM Unlearning via Loss Adjustment with Only Forget Data

Continual Memorization of Factoids in Large Language Models

Forgetting before Learning: Utilizing Parametric Arithmetic for Knowledge Updating in Large Language Models

Analyzing and Reducing Catastrophic Forgetting in Parameter Efficient Tuning

Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models

An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning

Revisiting Catastrophic Forgetting in Large Language Model Tuning

Measuring Forgetting of Memorized Training Examples

MEOW: MEMOry Supervised LLM Unlearning Via Inverted Facts

Refine Large Language Model Fine-tuning via Instruction Vector

Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference

Examining Forgetting in Continual Pre-training of Aligned Large Language Models