SoK: Reducing the Vulnerability of Fine-tuned Language Models to Membership Inference Attacks

Guy Amit,Abigail Goldsteen,Ariel Farkash
DOI: https://doi.org/10.48550/arXiv.2403.08481
2024-03-13
Abstract:Natural language processing models have experienced a significant upsurge in recent years, with numerous applications being built upon them. Many of these applications require fine-tuning generic base models on customized, proprietary datasets. This fine-tuning data is especially likely to contain personal or sensitive information about individuals, resulting in increased privacy risk. Membership inference attacks are the most commonly employed attack to assess the privacy leakage of a machine learning model. However, limited research is available on the factors that affect the vulnerability of language models to this kind of attack, or on the applicability of different defense strategies in the language domain. We provide the first systematic review of the vulnerability of fine-tuned large language models to membership inference attacks, the various factors that come into play, and the effectiveness of different defense strategies. We find that some training methods provide significantly reduced privacy risk, with the combination of differential privacy and low-rank adaptors achieving the best privacy protection against these attacks.
Machine Learning,Cryptography and Security
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of **privacy risks of fine - tuned large language models (LLMs) when facing membership inference attacks (MIA)**. Specifically, the paper focuses on the following aspects: 1. **Risk of privacy leakage**: With the wide application of natural language processing (NLP) models, many applications require fine - tuning general base models on customized and proprietary datasets. These fine - tuning data may contain personal or sensitive information, thus increasing the risk of privacy leakage. 2. **Influencing factors of membership inference attacks**: The paper systematically analyzes various factors that affect the vulnerability of fine - tuned large language models to membership inference attacks, including but not limited to: - **Overfitting**: Overfitting will cause the model to remember the detailed features in the training data, increasing the success rate of MIA. - **Model size and capacity**: A larger number of model parameters may lead to a higher data - memorizing ability. - **Batch size**: A larger batch size can reduce the vulnerability to MIA. - **Number of training iterations**: More training iterations will increase the success rate of MIA. 3. **Effectiveness of defense strategies**: The paper evaluates the effectiveness of different defense strategies in preventing membership inference attacks, especially for fine - tuned large language models. The research finds that: - **Differential privacy (DP) methods** such as DP - SGD and DP - LoRA combined with low - rank adaptors can significantly reduce privacy risks. - **LoRA (Low - Rank Adaptation) itself and its combination with smaller model sizes** can significantly reduce MIA risks without significantly affecting model accuracy. ### Main contributions of the paper 1. **First systematic review**: This is the first systematic review specifically for the vulnerability of fine - tuned large language models to membership inference attacks. 2. **Comprehensive analysis of influencing factors**: It analyzes in detail various factors that affect the vulnerability of fine - tuned LLMs to MIA. 3. **Evaluation of existing defense technologies**: It evaluates the effectiveness of current defense technologies in preventing fine - tuned LLMs from being attacked by MIA and proposes some new defense methods. Through these studies, the paper provides important insights and solutions for how to improve the privacy protection of fine - tuned large language models.