Digital Forgetting in Large Language Models: A Survey of Unlearning Methods
Alberto Blanco-Justicia,Najeeb Jebreel,Benet Manzanares,David Sánchez,Josep Domingo-Ferrer,Guillem Collell,Kuan Eeik Tan
2024-04-03
Abstract:The objective of digital forgetting is, given a model with undesirable knowledge or behavior, obtain a new model where the detected issues are no longer present. The motivations for forgetting include privacy protection, copyright protection, elimination of biases and discrimination, and prevention of harmful content generation. Effective digital forgetting has to be effective (meaning how well the new model has forgotten the undesired knowledge/behavior), retain the performance of the original model on the desirable tasks, and be scalable (in particular forgetting has to be more efficient than retraining from scratch on just the tasks/data to be retained). This survey focuses on forgetting in large language models (LLMs). We first provide background on LLMs, including their components, the types of LLMs, and their usual training pipeline. Second, we describe the motivations, types, and desired properties of digital forgetting. Third, we introduce the approaches to digital forgetting in LLMs, among which unlearning methodologies stand out as the state of the art. Fourth, we provide a detailed taxonomy of machine unlearning methods for LLMs, and we survey and compare current approaches. Fifth, we detail datasets, models and metrics used for the evaluation of forgetting, retaining and runtime. Sixth, we discuss challenges in the area. Finally, we provide some concluding remarks.
Cryptography and Security,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
### What problem does this paper attempt to solve?
This paper primarily explores the issue of digital forgetting in large language models (LLMs). Specifically:
1. **Research Background and Motivation**:
- **Privacy Protection**: Since LLMs are typically pre-trained on unfiltered datasets containing vast amounts of web data, this data may include personal privacy information or internal organizational data. Therefore, a mechanism is needed to ensure that these sensitive pieces of information are not remembered or leaked by the model.
- **Copyright Protection**: Similarly, the model might generate content that is protected by copyright, necessitating a method to avoid generating directly copied content.
- **Model Robustness**: LLMs may encounter low-quality or incorrect information at different stages (such as pre-training, fine-tuning, etc.), leading to degraded model performance or biased behavior.
- **Alignment with Human Values**: The data sources for LLM pre-training are diverse and unscreened, potentially containing content that does not align with current societal values (such as discrimination based on gender, race, etc.), requiring mechanisms to correct these issues.
2. **Types of Digital Forgetting**:
- **General Forgetting Requests**: According to Article 17 of the General Data Protection Regulation (GDPR), the "right to erasure," data subjects have the right to request the deletion of their personal data by the controller. However, how to implement this right in LLMs still requires further research.
3. **Solutions**:
- The paper details various forgetting methods, including global weight modification, local weight modification, architecture modification, and input/output modification, and compares and evaluates each method.
Through this research, the paper aims to explore effective digital forgetting mechanisms to address issues of privacy, copyright, robustness, and ethics in LLMs.