Label-Only Membership Inference Attack Based on Model Explanation

Yao Ma,Xurong Zhai,Dan Yu,Yuli Yang,Xingyu Wei,Yongle Chen
DOI: https://doi.org/10.1007/s11063-024-11682-1
IF: 2.565
2024-09-20
Neural Processing Letters
Abstract:It is well known that machine learning models (e.g., image recognition) can unintentionally leak information about the training set. Conventional membership inference relies on posterior vectors, and this task becomes extremely difficult when the posterior is masked. However, current label-only membership inference attacks require a large number of queries during the generation of adversarial samples, and thus incorrect inference generates a large number of invalid queries. Therefore, we introduce a label-only membership inference attack based on model explanations. It can transform a label-only attack into a traditional membership inference attack by observing neighborhood consistency and perform fine-grained membership inference for vulnerable samples. We use feature attribution to simplify the high-dimensional neighborhood sampling process, quickly identify decision boundaries and recover a posteriori vectors. It also compares different privacy risks faced by different samples through finding vulnerable samples. The method is validated on CIFAR-10, CIFAR-100 and MNIST datasets. The results show that membership attributes can be identified even using a simple sampling method. Furthermore, vulnerable samples expose the model to greater privacy risks.
computer science, artificial intelligence
What problem does this paper attempt to address?