Abstract:The high cost of model training makes it increasingly desirable to develop techniques for unlearning. These techniques seek to remove the influence of a training example without having to retrain the model from scratch. Intuitively, once a model has unlearned, an adversary that interacts with the model should no longer be able to tell whether the unlearned example was included in the model's training set or not. In the privacy literature, this is known as membership inference. In this work, we discuss adaptations of Membership Inference Attacks (MIAs) to the setting of unlearning (leading to their "U-MIA" counterparts). We propose a categorization of existing U-MIAs into "population U-MIAs", where the same attacker is instantiated for all examples, and "per-example U-MIAs", where a dedicated attacker is instantiated for each example. We show that the latter category, wherein the attacker tailors its membership prediction to each example under attack, is significantly stronger. Indeed, our results show that the commonly used U-MIAs in the unlearning literature overestimate the privacy protection afforded by existing unlearning techniques on both vision and language models. Our investigation reveals a large variance in the vulnerability of different examples to per-example U-MIAs. In fact, several unlearning algorithms lead to a reduced vulnerability for some, but not all, examples that we wish to unlearn, at the expense of increasing it for other examples. Notably, we find that the privacy protection for the remaining training examples may worsen as a consequence of unlearning. We also discuss the fundamental difficulty of equally protecting all examples using existing unlearning schemes, due to the different rates at which examples are unlearned. We demonstrate that naive attempts at tailoring unlearning stopping criteria to different examples fail to alleviate these issues.

Label-only Membership Inference Attacks on Machine Unlearning Without Dependence of Posteriors

FP 2 -MIA: A Membership Inference Attack Free of Posterior Probability in Machine Unlearning.

You Only Query Once: an Efficient Label-Only Membership Inference Attack

Label-Only Membership Inference Attack Based on Model Explanation

Membership Inference via Backdooring

Silver Linings in the Shadows: Harnessing Membership Inference for Machine Unlearning

Membership inference attack with relative decision boundary distance

Defending Against Membership Inference Attacks: RM Learning is All You Need

Adversarial Machine Unlearning

Privacy Analysis of Deep Learning in the Wild: Membership Inference Attacks against Transfer Learning

Machine Learning with Membership Privacy using Adversarial Regularization

Defending Against Label-Only Attacks via Meta-Reinforcement Learning

Learn What You Want to Unlearn: Unlearning Inversion Attacks against Machine Unlearning

Chameleon: Increasing Label-Only Membership Leakage with Adaptive Poisoning

Label-Only Membership Inference Attack against Node-Level Graph Neural Networks

A Method to Facilitate Membership Inference Attacks in Deep Learning Models

Inexact Unlearning Needs More Careful Evaluations to Avoid a False Sense of Privacy

Pseudo Unlearning via Sample Swapping with Hash

OSLO: One-Shot Label-Only Membership Inference Attacks

Conditional Matching GAN Guided Reconstruction Attack in Machine Unlearning

CLMIA: Membership Inference Attacks via Unsupervised Contrastive Learning