Label-only Membership Inference Attacks on Machine Unlearning Without Dependence of Posteriors

Zhaobo Lu,Hai Liang,Minghao Zhao,Qingzhe Lv,Tiancai Liang,Yilei Wang
DOI: https://doi.org/10.1002/int.23000
IF: 8.993
2022-01-01
International Journal of Intelligent Systems
Abstract:Machine unlearning is the process through which a deployed machine learning model is enforced to forget about some of its training data items. It normally generates two machine learning models, the original model and the unlearned model, indicating training results before and after data items are deleted. However, recent studies find that machine unlearning is vulnerable to membership inference attacks-as the directivity of training and nontraining data (i.e., data items in the training set have high posterior probabilities), the attackers can utilize this property to infer whether an item has been used for original model training. Nevertheless, such attacks are incapable in label-only settings, in which the attackers are infeasible to get the posteriors. In this paper, we propose a new label-only membership inference attack scheme targeted at machine unlearning to eliminate the dependence on posteriors. Our heuristic is that injected turbulence on candidate samples will present different behaviors for training and nontraining data. Thus, in our scheme, the attacker iteratively query on the original/unlearned models and inject turbulence to change their predicting labels; it determines whether an item is having-been-delated by observing the disturbance amplitude. Extensive experiments (i.e., on MNIST, CIFAR10, CIFAR100, and STL10 data sets) show that our method achieves high inference accuracy (measured by AUC) in label-only settings, for example, AUC = 0.96 for MNIST data set. Besides, we analyze the existing countermeasures in mitigating inference attacks and find that our scheme can bypass most of them.
What problem does this paper attempt to address?