Machine Unlearning of Pre-trained Large Language Models

Jin Yao,Eli Chien,Minxin Du,Xinyao Niu,Tianhao Wang,Zezhou Cheng,Xiang Yue
2024-05-30
Abstract:This study investigates the concept of the `right to be forgotten' within the context of large language models (LLMs). We explore machine unlearning as a pivotal solution, with a focus on pre-trained models--a notably under-researched area. Our research delineates a comprehensive framework for machine unlearning in pre-trained LLMs, encompassing a critical analysis of seven diverse unlearning methods. Through rigorous evaluation using curated datasets from arXiv, books, and GitHub, we establish a robust benchmark for unlearning performance, demonstrating that these methods are over $10^5$ times more computationally efficient than retraining. Our results show that integrating gradient ascent with gradient descent on in-distribution data improves hyperparameter robustness. We also provide detailed guidelines for efficient hyperparameter tuning in the unlearning process. Our findings advance the discourse on ethical AI practices, offering substantive insights into the mechanics of machine unlearning for pre-trained LLMs and underscoring the potential for responsible AI development.
Computation and Language,Artificial Intelligence,Cryptography and Security,Machine Learning
What problem does this paper attempt to address?
This paper discusses the concept of "right to be forgotten" in large language models (LLMs), which refers to how to systematically remove specific data from the models. The research focuses on machine unlearning in pre-trained models, which is a relatively understudied area. The paper proposes a comprehensive framework to unify machine unlearning methods and analyzes seven different forgetting methods. Through rigorous evaluations on arXiv, books, and GitHub datasets, a benchmark for forgetting performance is established, showing that these methods are more than 105 times more efficient than retraining. The paper points out that current research on machine unlearning mainly focuses on fine-tuning models, while forgetting in pre-trained models is more challenging due to the need to adapt existing methods, the lack of availability of pre-training data, and the high cost of retraining. The main contributions of the paper include defining the forgetting problem in pre-trained LLMs, proposing a unified forgetting framework, introducing an approximate retraining evaluation benchmark, releasing real pre-training datasets, and providing hyperparameter tuning guidelines for other methods. In the experimental section, the paper compares the effectiveness of different forgetting methods, such as gradient ascent, random label fine-tuning, and adversarial sample forgetting. The results show that combining gradient ascent with gradient descent on the retained dataset can improve the robustness of hyperparameters. In addition, the paper evaluates the forgetting effect through Membership Inference Attacks (MIA) to verify whether specific sequences have been removed from the model's training data. In summary, the paper aims to provide a comprehensive solution for forgetting in pre-trained LLMs and promote the development of more responsible and ethical AI systems.