Abstract:Membership Inference Attacks (MIA) aim to infer whether a target data record has been utilized for model training or not. Prior attempts have quantified the privacy risks of language models (LMs) via MIAs, but there is still no consensus on whether existing MIA algorithms can cause remarkable privacy leakage on practical Large Language Models (LLMs). Existing MIAs designed for LMs can be classified into two categories: reference-free and reference-based attacks. They are both based on the hypothesis that training records consistently strike a higher probability of being sampled. Nevertheless, this hypothesis heavily relies on the overfitting of target models, which will be mitigated by multiple regularization methods and the generalization of LLMs. The reference-based attack seems to achieve promising effectiveness in LLMs, which measures a more reliable membership signal by comparing the probability discrepancy between the target model and the reference model. However, the performance of reference-based attack is highly dependent on a reference dataset that closely resembles the training dataset, which is usually inaccessible in the practical scenario. Overall, existing MIAs are unable to effectively unveil privacy leakage over practical fine-tuned LLMs that are overfitting-free and private. We propose a Membership Inference Attack based on Self-calibrated Probabilistic Variation (SPV-MIA). Specifically, since memorization in LLMs is inevitable during the training process and occurs before overfitting, we introduce a more reliable membership signal, probabilistic variation, which is based on memorization rather than overfitting. Furthermore, we introduce a self-prompt approach, which constructs the dataset to fine-tune the reference model by prompting the target LLM itself. In this manner, the adversary can collect a dataset with a similar distribution from public APIs.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenges and limitations faced by existing Membership Inference Attacks (MIA) when applied to Large Language Models (LLMs). Specifically: 1. **Dependence on the over - fitting assumption**: Existing MIA methods usually assume that training records have a higher sampling probability than non - training records. This assumption only holds in the case of model over - fitting. However, LLMs reduce the possibility of over - fitting through various regularization methods and generalization capabilities, resulting in a high false - positive rate in the practical application of these methods. 2. **Dependence on the reference data set**: Existing reference models rely on a reference data set with a distribution similar to that of the training data set. But in actual scenarios, it is very difficult to obtain such a high - quality reference data set, which significantly reduces the performance of MIA based on the reference model. To solve these problems, the authors propose a Membership Inference Attack based on Self - calibrated Probability Variation (SPV - MIA). The main innovations of this method include: - **Self - prompt method**: By prompting the target LLM to generate text, a reference data set with a distribution similar to that of the training data set is constructed, thus avoiding the dependence on a high - quality reference data set. - **Probability variation evaluation**: A new membership signal - Probabilistic Variation - is introduced. It is based on the memory of the LLM rather than over - fitting and can detect member records more reliably. Through these two modules, SPV - MIA significantly improves the performance of MIA on multiple data sets and LLMs, and the AUC value has increased from 0.7 to over 0.9. ### Formula summary 1. **Joint probability maximization**: \[ L_{\text{CLM}} = -\frac{1}{M} \sum_{j = 1}^{M} \sum_{i = 1}^{|x^{(j)}|} \log p_\theta(t_i|x^{(j)}_{<i}) \] where \(M\) is the number of training records, and \(p_\theta(t_i|x^{(j)}_{<i})\) is the probability of predicting the next word given the prefix \(x^{(j)}_{<i}\). 2. **Definition of probability variation**: \[ e p_\theta(x):=\mathbb{E}_z[z^{\top}H_p(x)z] \] where \(H_p(x)\) is the Hessian matrix of the probability function \(p_\theta(x)\), and \(z^{\top}H_p(x)z\) represents the second - order directional derivative in the direction \(z\). 3. **Symmetric approximation**: \[ z^{\top}H_p(x)z\approx\frac{p_\theta(x + hz)+p_\theta(x - hz)- 2p_\theta(x)}{h^2} \] Further simplified as: \[ e p_\theta(x)\approx\frac{1}{2N}\sum_{n = 1}^{N}(p_\theta(e x^+_{n})+p_\theta(e x^-_{n}))-p_\theta(x) \] where \(e x^{\pm}_{n}=x\pm z_n\) are symmetric text pairs generated by the synonymous sentence model. Through these improvements, SPV - MIA can more accurately detect member records without relying on a high - quality reference data set, thereby revealing the potential risks of LLMs in terms of privacy leakage.

Practical Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration

Scaling Up Membership Inference: When and How Attacks Succeed on Large Language Models

Order of Magnitude Speedups for LLM Membership Inference

Exposing Privacy Gaps: Membership Inference Attack on Preference Data for LLM Alignment

Learning-Based Difficulty Calibration for Enhanced Membership Inference Attacks

Do Membership Inference Attacks Work on Large Language Models?

Membership Inference Attacks against Language Models via Neighbourhood Comparison

Membership Inference Attacks against Large Vision-Language Models

Context-Aware Membership Inference Attacks against Pre-trained Large Language Models

Pandora's White-Box: Precise Training Data Detection and Extraction in Large Language Models

SoK: Membership Inference Attacks on LLMs are Rushing Nowhere (and How to Fix It)

Is Difficulty Calibration All We Need? Towards More Practical Membership Inference Attacks

Sampling-based Pseudo-Likelihood for Membership Inference Attacks

Membership Inference Attacks Against Self-supervised Speech Models

ReCaLL: Membership Inference via Relative Conditional Log-Likelihoods

Detecting Training Data of Large Language Models via Expectation Maximization

Practical Membership Inference Attacks Against Large-Scale Multi-Modal Models: A Pilot Study

LLM Dataset Inference: Did you train on my dataset?

Nob-MIAs: Non-biased Membership Inference Attacks Assessment on Large Language Models with Ex-Post Dataset Construction

Self-Comparison for Dataset-Level Membership Inference in Large (Vision-)Language Models

HP-MIA: A Novel Membership Inference Attack Scheme for High Membership Prediction Precision