Abstract:We explore machine unlearning (MU) in the domain of large language models (LLMs), referred to as LLM unlearning. This initiative aims to eliminate undesirable data influence (e.g., sensitive or illegal information) and the associated model capabilities, while maintaining the integrity of essential knowledge generation and not affecting causally unrelated information. We envision LLM unlearning becoming a pivotal element in the life-cycle management of LLMs, potentially standing as an essential foundation for developing generative AI that is not only safe, secure, and trustworthy, but also resource-efficient without the need of full retraining. We navigate the unlearning landscape in LLMs from conceptual formulation, methodologies, metrics, and applications. In particular, we highlight the often-overlooked aspects of existing LLM unlearning research, e.g., unlearning scope, data-model interaction, and multifaceted efficacy assessment. We also draw connections between LLM unlearning and related areas such as model editing, influence functions, model explanation, adversarial training, and reinforcement learning. Furthermore, we outline an effective assessment framework for LLM unlearning and explore its applications in copyright and privacy safeguards and sociotechnical harm reduction.

What problem does this paper attempt to address?

The paper aims to address the issue of Machine Unlearning (MU) in Large Language Models (LLM). Specifically, the research goal is to eliminate the influence of unnecessary data (such as sensitive or illegal information) from the model without retraining it, while maintaining the model's ability to generate fundamental knowledge and not affecting causally unrelated information. The research team believes that machine unlearning in LLMs will become a key component in their lifecycle management and is expected to be the foundation for developing generative AI that is safe, reliable, and resource-efficient. The main contributions of the research include: 1. **In-depth Review**: A detailed review of the basic concepts and principles of LLM unlearning, covering problem definition, method classification, evaluation methods, and practical applications. 2. **Revealing New Perspectives**: Introducing previously overlooked problem dimensions, such as the importance of precisely defining the scope of unlearning, explaining the interaction between data and models, and exploring the effectiveness of adversarial evaluation. 3. **Establishing Connections**: Comparative analysis of LLM unlearning with other related fields (such as model editing, influence functions, adversarial training, etc.). 4. **Future Outlook**: Indicating new directions and opportunities for LLM unlearning research. Through these contributions, the authors hope to advance the development of LLM unlearning technology, revealing its potential opportunities and challenges.

Rethinking Machine Unlearning for Large Language Models

A Closer Look at Machine Unlearning for Large Language Models

Machine Unlearning in Large Language Models

Machine Unlearning of Pre-trained Large Language Models

The Frontier of Data Erasure: Machine Unlearning for Large Language Models

Machine Unlearning for Traditional Models and Large Language Models: A Short Survey

Does Unlearning Truly Unlearn? A Black Box Evaluation of LLM Unlearning Methods

Towards Safer Large Language Models through Machine Unlearning

Evaluating Deep Unlearning in Large Language Models

Unlearning with Control: Assessing Real-world Utility for Large Language Model Unlearning

UnUnlearning: Unlearning is not sufficient for content regulation in advanced generative AI

Protecting Privacy in Multimodal Large Language Models with MLLMU-Bench

Towards Robust Evaluation of Unlearning in LLMs via Data Transformations

CodeUnlearn: Amortized Zero-Shot Machine Unlearning in Language Models Using Discrete Concept

An Adversarial Perspective on Machine Unlearning for AI Safety

Single Image Unlearning: Efficient Machine Unlearning in Multimodal Large Language Models

Offset Unlearning for Large Language Models

Unveiling Entity-Level Unlearning for Large Language Models: A Comprehensive Analysis

Unlearn What You Want to Forget: Efficient Unlearning for LLMs