Abstract:As Large Language Models (LLMs) demonstrate extensive capability in learning from documents, LLM unlearning becomes an increasingly important research area to address concerns of LLMs in terms of privacy, copyright, etc. A conventional LLM unlearning task typically involves two goals: (1) The target LLM should forget the knowledge in the specified forget documents, and (2) it should retain the other knowledge that the LLM possesses, for which we assume access to a small number of retain documents. To achieve both goals, a mainstream class of LLM unlearning methods introduces an optimization framework with a combination of two objectives - maximizing the prediction loss on the forget documents while minimizing that on the retain documents, which suffers from two challenges, degenerated output and catastrophic forgetting. In this paper, we propose a novel unlearning framework called Unlearning from Logit Difference (ULD), which introduces an assistant LLM that aims to achieve the opposite of the unlearning goals: remembering the forget documents and forgetting the retain knowledge. ULD then derives the unlearned LLM by computing the logit difference between the target and the assistant LLMs. We show that such reversed objectives would naturally resolve both aforementioned challenges while significantly improving the training efficiency. Extensive experiments demonstrate that our method efficiently achieves the intended forgetting while preserving the LLM's overall capabilities, reducing training time by more than threefold. Notably, our method loses 0% of model utility on the ToFU benchmark, whereas baseline methods may sacrifice 17% of utility on average to achieve comparable forget quality. Our code will be publicly available at <a class="link-external link-https" href="https://github.com/UCSB-NLP-Chang/ULD" rel="external noopener nofollow">this https URL</a>.

To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models

Fine-grained Pluggable Gradient Ascent for Knowledge Unlearning in Language Models

RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models

Towards Robust and Cost-Efficient Knowledge Unlearning for Large Language Models

RKLD: Reverse KL-Divergence-based Knowledge Distillation for Unlearning Personal Information in Large Language Models

A Closer Look at Machine Unlearning for Large Language Models

Does your LLM truly unlearn? An embarrassingly simple approach to recover unlearned knowledge

UNLEARN Efficient Removal of Knowledge in Large Language Models

Large Scale Knowledge Washing

LLM Unlearning via Loss Adjustment with Only Forget Data

Machine Unlearning in Large Language Models

Unlearn What You Want to Forget: Efficient Unlearning for LLMs

Offset Unlearning for Large Language Models

Towards Robust Knowledge Unlearning: An Adversarial Framework for Assessing and Improving Unlearning Robustness in Large Language Models

Machine Unlearning of Pre-trained Large Language Models

Learning to Refuse: Towards Mitigating Privacy Risks in LLMs

Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference

TOFU: A Task of Fictitious Unlearning for LLMs

Digital Forgetting in Large Language Models: A Survey of Unlearning Methods

Towards Safer Large Language Models through Machine Unlearning

Answer When Needed, Forget When Not: Language Models Pretend to Forget via In-Context Knowledge Unlearning