Understanding and defending against White-box membership inference attack in deep learning

Di Wu,Saiyu Qi,Yong Qi,Qian Li,Bowen Cai,Qi Guo,Jingxian Cheng
DOI: https://doi.org/10.1016/j.knosys.2022.110014
IF: 8.139
2023-01-01
Knowledge-Based Systems
Abstract:Membership inference attacks (MIA) exploit the fact that deep learning algorithms leak information about their training data through the learned model. It has been treated as an indicator which reveals the privacy leakage of machine learning models. In this work, we aim to understand the advantage achieved by White-box MIA, and defend against White-box MIA. Firstly, we estimate the KL divergence on the hidden layers’ features between training set and test set as the internal generalization gap. By comparing the internal generalization gap and the generalization gap on the output layer, we raise two insights including (1) the existence of larger generalization gaps on hidden layers and (2) feasibility of generalization gap minimization in defending White-box MIA. Based on our insights, we further design a novel defense method named Nirvana. It intentionally minimizes generalization gap to defend White-box MIA. Formally, Nirvana works by selecting a hidden layer with large generalization gaps and executing a multi-samples convex combination among features on the layer during the training to defend against White-box MIA. Finally, we empirically evaluate Nirvana with state-of-the-art defense methods on CIFAR100 dataset, Purchase100 dataset, and Texas dataset. The experiment results show that Nirvana achieves a trade-off between utility and privacy. It can defend both White-box MIA and Black-box MIA while the test accuracy of the model is maintained. It outperforms previous defense methods in defending against White-box MIA.
computer science, artificial intelligence
What problem does this paper attempt to address?