Abstract:Language models (LMs) are trained on vast amounts of text data, which may include private and copyrighted content. Data owners may request the removal of their data from a trained model due to privacy or copyright concerns. However, exactly unlearning only these datapoints (i.e., retraining with the data removed) is intractable in modern-day models. This has led to the development of many approximate unlearning algorithms. The evaluation of the efficacy of these algorithms has traditionally been narrow in scope, failing to precisely quantify the success and practicality of the algorithm from the perspectives of both the model deployers and the data owners. We address this issue by proposing MUSE, a comprehensive machine unlearning evaluation benchmark that enumerates six diverse desirable properties for unlearned models: (1) no verbatim memorization, (2) no knowledge memorization, (3) no privacy leakage, (4) utility preservation on data not intended for removal, (5) scalability with respect to the size of removal requests, and (6) sustainability over sequential unlearning requests. Using these criteria, we benchmark how effectively eight popular unlearning algorithms on 7B-parameter LMs can unlearn Harry Potter books and news articles. Our results demonstrate that most algorithms can prevent verbatim memorization and knowledge memorization to varying degrees, but only one algorithm does not lead to severe privacy leakage. Furthermore, existing algorithms fail to meet deployer's expectations because they often degrade general model utility and also cannot sustainably accommodate successive unlearning requests or large-scale content removal. Our findings identify key issues with the practicality of existing unlearning algorithms on language models, and we release our benchmark to facilitate further evaluations: <a class="link-external link-http" href="http://muse-bench.github.io" rel="external noopener nofollow">this http URL</a>

Scalability of memorization-based machine unlearning

Deep Unlearn: Benchmarking Machine Unlearning

Markov Chain Monte Carlo-Based Machine Unlearning: Unlearning What Needs to be Forgotten

Learning to Unlearn for Robust Machine Unlearning

Challenging Forgets: Unveiling the Worst-Case Forget Sets in Machine Unlearning

Mitigating Memorization In Language Models

A hybrid framework for effective and efficient machine unlearning

Gone but Not Forgotten: Improved Benchmarks for Machine Unlearning

What makes unlearning hard and what to do about it

MUSE: Machine Unlearning Six-Way Evaluation for Language Models

Learn to Forget: Memorization Elimination for Neural Networks.

MUter: Machine Unlearning on Adversarially Trained Models

To Each (Textual Sequence) Its Own: Improving Memorized-Data Unlearning in Large Language Models

MEOW: MEMOry Supervised LLM Unlearning Via Inverted Facts

Machine Unlearning with Minimal Gradient Dependence for High Unlearning Ratios

Machine Unlearning in Forgettability Sequence

Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning

Machine unlearning through fine-grained model parameters perturbation

Towards Natural Machine Unlearning

A Closer Look at Machine Unlearning for Large Language Models

Learn to Forget: Machine Unlearning Via Neuron Masking