Improved Membership Inference Attacks Against Language Classification Models

Shlomit Shachor,Natalia Razinkov,Abigail Goldsteen
2024-07-18
Abstract:Artificial intelligence systems are prevalent in everyday life, with use cases in retail, manufacturing, health, and many other fields. With the rise in AI adoption, associated risks have been identified, including privacy risks to the people whose data was used to train models. Assessing the privacy risks of machine learning models is crucial to enabling knowledgeable decisions on whether to use, deploy, or share a model. A common approach to privacy risk assessment is to run one or more known attacks against the model and measure their success rate. We present a novel framework for running membership inference attacks against classification models. Our framework takes advantage of the ensemble method, generating many specialized attack models for different subsets of the data. We show that this approach achieves higher accuracy than either a single attack model or an attack model per class label, both on classical and language classification tasks.
Machine Learning,Artificial Intelligence,Cryptography and Security
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the accuracy of Membership Inference Attacks (MIA) against classification models. Specifically, the authors propose a new framework to improve membership inference attacks on classification models by using an ensemble method to generate multiple attack models specifically for different data subsets. ### Problem Background With the wide application of artificial intelligence (AI) systems, privacy risks have gradually become the focus of attention. In particular, in machine learning (ML) models, membership inference attacks aim to distinguish samples in the training data (members) from samples in non - training data (non - members). Such attacks can reveal whether the model has leaked the data information used for training, thereby evaluating the privacy risks of the model. ### Research Objectives Existing membership inference attack methods usually use a single attack model or attack models trained separately according to class labels, and these methods have limited effectiveness in some cases. To improve the accuracy of the attack, this paper proposes a new framework to improve membership inference attacks in the following ways: 1. **Ensemble Method**: Divide the initial member and non - member data sets into multiple non - overlapping small subsets and train specialized attack models for each subset. 2. **Model Optimization**: For each subset, try multiple combinations (including attack model architectures, input features, and scaling methods), and select the best combination to achieve the highest attack performance. 3. **Result Aggregation**: Aggregate the results of multiple attack models to more comprehensively reflect the real leakage situation of the target model. ### Main Contributions - Proposed a new framework based on the ensemble method, which significantly improves the accuracy of membership inference attacks. - This framework is applicable to classical models and large - language models (LLM), and performs well when confronting models with applied privacy defense measures. - Experimental results show that compared with a single attack model or attack models trained separately according to class labels, the new framework has a significant improvement in performance on various data sets. Through this method, researchers can more accurately evaluate the privacy risks of machine - learning models, thereby helping organizations make more informed decisions and ensuring the security and privacy of the models.