Abstract:Natural language processing models have experienced a significant upsurge in recent years, with numerous applications being built upon them. Many of these applications require fine-tuning generic base models on customized, proprietary datasets. This fine-tuning data is especially likely to contain personal or sensitive information about individuals, resulting in increased privacy risk. Membership inference attacks are the most commonly employed attack to assess the privacy leakage of a machine learning model. However, limited research is available on the factors that affect the vulnerability of language models to this kind of attack, or on the applicability of different defense strategies in the language domain. We provide the first systematic review of the vulnerability of fine-tuned large language models to membership inference attacks, the various factors that come into play, and the effectiveness of different defense strategies. We find that some training methods provide significantly reduced privacy risk, with the combination of differential privacy and low-rank adaptors achieving the best privacy protection against these attacks.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of **privacy risks of fine - tuned large language models (LLMs) when facing membership inference attacks (MIA)**. Specifically, the paper focuses on the following aspects: 1. **Risk of privacy leakage**: With the wide application of natural language processing (NLP) models, many applications require fine - tuning general base models on customized and proprietary datasets. These fine - tuning data may contain personal or sensitive information, thus increasing the risk of privacy leakage. 2. **Influencing factors of membership inference attacks**: The paper systematically analyzes various factors that affect the vulnerability of fine - tuned large language models to membership inference attacks, including but not limited to: - **Overfitting**: Overfitting will cause the model to remember the detailed features in the training data, increasing the success rate of MIA. - **Model size and capacity**: A larger number of model parameters may lead to a higher data - memorizing ability. - **Batch size**: A larger batch size can reduce the vulnerability to MIA. - **Number of training iterations**: More training iterations will increase the success rate of MIA. 3. **Effectiveness of defense strategies**: The paper evaluates the effectiveness of different defense strategies in preventing membership inference attacks, especially for fine - tuned large language models. The research finds that: - **Differential privacy (DP) methods** such as DP - SGD and DP - LoRA combined with low - rank adaptors can significantly reduce privacy risks. - **LoRA (Low - Rank Adaptation) itself and its combination with smaller model sizes** can significantly reduce MIA risks without significantly affecting model accuracy. ### Main contributions of the paper 1. **First systematic review**: This is the first systematic review specifically for the vulnerability of fine - tuned large language models to membership inference attacks. 2. **Comprehensive analysis of influencing factors**: It analyzes in detail various factors that affect the vulnerability of fine - tuned LLMs to MIA. 3. **Evaluation of existing defense technologies**: It evaluates the effectiveness of current defense technologies in preventing fine - tuned LLMs from being attacked by MIA and proposes some new defense methods. Through these studies, the paper provides important insights and solutions for how to improve the privacy protection of fine - tuned large language models.

SoK: Reducing the Vulnerability of Fine-tuned Language Models to Membership Inference Attacks

User Inference Attacks on Large Language Models

Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models

Improved Membership Inference Attacks Against Language Classification Models

Privacy Risks of Securing Machine Learning Models against Adversarial Examples

PreCurious: How Innocent Pre-Trained Language Models Turn into Privacy Traps

Systematic Evaluation of Privacy Risks of Machine Learning Models

Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage

Membership Inference Attack Susceptibility of Clinical Language Models

Large Language Models Can Be Good Privacy Protection Learners

Generated Data with Fake Privacy: Hidden Dangers of Fine-tuning Large Language Models on Generated Data

Explaining the Model, Protecting Your Data: Revealing and Mitigating the Data Privacy Risks of Post-Hoc Model Explanations via Membership Inference

Identifying and Mitigating Privacy Risks Stemming from Language Models: A Survey

A Method to Facilitate Membership Inference Attacks in Deep Learning Models

Privacy Evaluation Benchmarks for NLP Models

Can Language Models be Instructed to Protect Personal Information?

The Janus Interface: How Fine-Tuning in Large Language Models Amplifies the Privacy Risks

Gotcha! This Model Uses My Code! Evaluating Membership Leakage Risks in Code Models

Analyzing Leakage of Personally Identifiable Information in Language Models

Harmful Fine-tuning Attacks and Defenses for Large Language Models: A Survey

Defenses to Membership Inference Attacks: A Survey