Abstract:Membership inference (MI) attacks try to determine if a data sample was used to train a machine learning model. For foundation models trained on unknown Web data, MI attacks can be used to detect copyrighted training materials, measure test set contamination, or audit machine unlearning. Unfortunately, we find that evaluations of MI attacks for foundation models are flawed, because they sample members and non-members from different distributions. For 8 published MI evaluation datasets, we show that blind attacks -- that distinguish the member and non-member distributions without looking at any trained model -- outperform state-of-the-art MI attacks. Existing evaluations thus tell us nothing about membership leakage of a foundation model's training data.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the flaws in the current evaluation methods of Membership Inference Attacks (MI attacks) against Foundation Models. Specifically, the author points out: 1. **Inconsistent distribution of member and non - member data**: In many existing MI attack evaluations, member and non - member data are sampled from different distributions. For example, in some datasets, member data comes from data before a certain specific time period, while non - member data comes from the time period after. This temporal difference allows blind attacks to easily distinguish between members and non - members without relying on the trained model. 2. **Unreliable existing evaluation methods**: Due to the above problem, the existing MI attack evaluation cannot accurately reflect whether the model has leaked information about the training data. By designing simple blind attacks (such as date - based detection, bag - of - words classifiers, etc.), the author has proven that these blind attacks outperform the existing state - of - the - art MI attack methods on multiple publicly available MI evaluation datasets. This indicates that the existing MI attacks may not have actually extracted the member information in the model, but have taken advantage of the differences in data distribution. 3. **Propose improvement directions**: In order to evaluate MI attacks more reliably, the author suggests using datasets with clear training set and test set divisions, such as The Pile or DataComp. These datasets can ensure that member and non - member data come from the same distribution, thus avoiding the distribution shift problem. ### Main contributions - **Reveal the flaws of existing evaluation methods**: By analyzing 8 publicly available MI evaluation datasets, the author shows the distribution shift problems existing in these datasets and proves the superior performance of blind attacks on these datasets. - **Provide improvement solutions**: It is recommended that future research should use datasets with randomly divided training sets and test sets to evaluate MI attacks to ensure the reliability of the evaluation results. ### Formula representation The evaluation metrics involved in the paper include: - **TPR@FPR (True Positive Rate at False Positive Rate)**: The True Positive Rate calculated at a given False Positive Rate. \[ \text{TPR@FPR}=\frac{\text{TP}}{\text{TP}+\text{FN}} \] - **AUC ROC (Area Under the Receiver Operating Characteristic Curve)**: The area under the Receiver Operating Characteristic Curve, which is used to measure the overall performance of a classifier. \[ \text{AUC ROC}=\int_{0}^{1}\text{TPR}(FPR)\,d(FPR) \] These formulas are used to evaluate the effectiveness of MI attacks and are compared with the results of blind attacks. ### Summary This paper reveals the serious flaws in the current MI attack evaluation methods and proposes an improved evaluation method. By using datasets with clear training set and test set divisions, future MI attack evaluations will be more reliable and can better reflect the actual member information leakage of the model.

Blind Baselines Beat Membership Inference Attacks for Foundation Models

Practical Blind Membership Inference Attack via Differential Comparisons

On the Discredibility of Membership Inference Attacks

Membership Inference Attacks against Language Models via Neighbourhood Comparison

Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data

Fundamental Limits of Membership Inference Attacks on Machine Learning Models

Membership Inference Attacks and Defenses in Classification Models

Real-World Benchmarks Make Membership Inference Attacks Fail on Diffusion Models

MIST: Defending Against Membership Inference Attacks Through Membership-Invariant Subspace Training

Membership reconstruction attack in deep neural networks

A Method to Facilitate Membership Inference Attacks in Deep Learning Models

Can Membership Inferencing be Refuted?

Black-box Membership Inference Attacks against Fine-tuned Diffusion Models

Unveiling the Unseen: Exploring Whitebox Membership Inference through the Lens of Explainability

Defenses to Membership Inference Attacks: A Survey

Subject-Level Membership Inference Attack via Data Augmentation and Model Discrepancy

When Fairness Meets Privacy: Exploring Privacy Threats in Fair Binary Classifiers via Membership Inference Attacks

Membership Inference Attacks on Machine Learning: A Survey

Defending Against Membership Inference Attacks: RM Learning is All You Need

Membership Inference via Backdooring