Abstract:For small privacy parameter $\epsilon$, $\epsilon$-differential privacy (DP) provides a strong worst-case guarantee that no membership inference attack (MIA) can succeed at determining whether a person's data was used to train a machine learning model. The guarantee of DP is worst-case because: a) it holds even if the attacker already knows the records of all but one person in the data set; and b) it holds uniformly over all data sets. In practical applications, such a worst-case guarantee may be overkill: practical attackers may lack exact knowledge of (nearly all of) the private data, and our data set might be easier to defend, in some sense, than the worst-case data set. Such considerations have motivated the industrial deployment of DP models with large privacy parameter (e.g. $\epsilon \geq 7$), and it has been observed empirically that DP with large $\epsilon$ can successfully defend against state-of-the-art MIAs. Existing DP theory cannot explain these empirical findings: e.g., the theoretical privacy guarantees of $\epsilon \geq 7$ are essentially vacuous. In this paper, we aim to close this gap between theory and practice and understand why a large DP parameter can prevent practical MIAs. To tackle this problem, we propose a new privacy notion called practical membership privacy (PMP). PMP models a practical attacker's uncertainty about the contents of the private data. The PMP parameter has a natural interpretation in terms of the success rate of a practical MIA on a given data set. We quantitatively analyze the PMP parameter of two fundamental DP mechanisms: the exponential mechanism and Gaussian mechanism. Our analysis reveals that a large DP parameter often translates into a much smaller PMP parameter, which guarantees strong privacy against practical MIAs. Using our findings, we offer principled guidance for practitioners in choosing the DP parameter.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to explain why in practical applications, differential privacy (DP) with a large privacy parameter $\varepsilon$ can successfully resist membership inference attacks (MIAs). Specifically, the paper attempts to bridge the gap between theory and practice and understand why a large value of $\varepsilon$ can prevent actual MIAs. #### Background and Motivation 1. **Basic Concepts of Differential Privacy**: - For a small $\varepsilon$ (for example, $\varepsilon < 1$), $\varepsilon$-differential privacy provides a strong worst - case guarantee, that is, no membership inference attack can determine whether an individual's data is used to train a machine - learning model. - This guarantee is based on the worst - case assumption: even if an attacker already knows all data records except one individual, it cannot infer whether the individual's data is used for training. 2. **Challenges in Practical Applications**: - In practical applications, this worst - case guarantee may be too strict. Actual attackers usually lack precise knowledge of almost all private data, and some datasets may be easier to defend than the worst - case datasets. - Therefore, in the industry, DP models with large privacy parameters (for example, $\varepsilon\geq7$) are often deployed, and it is observed that these models can successfully resist the state - of - the - art MIAs. 3. **Contradiction between Theory and Practice**: - Existing DP theories cannot explain these empirical findings. For example, when $\varepsilon\geq7$, the privacy protection provided theoretically is almost nil, but in practice it can effectively defend against MIAs. #### Core Problems of the Paper To bridge the gap between theory and practice, the paper proposes a new privacy concept - **Practical Membership Privacy (PMP)**. The PMP model takes into account the uncertainty of actual attackers, that is, the attacker has limited knowledge of the content in the dataset. By introducing PMP, the paper attempts to answer the following questions: - **Why can a large $\varepsilon$ prevent actual MIAs?** ### Main Contributions 1. **Proposing the PMP Concept**: - PMP is a new privacy definition that models the partial knowledge and distribution knowledge of actual attackers about the dataset content. - The PMP parameter can be interpreted according to the success rate of actual MIAs on a given dataset. 2. **Analyzing the Relationship between PMP and DP**: - The paper quantitatively analyzes the PMP parameters of two basic DP mechanisms (the exponential mechanism and the Gaussian mechanism). - The analysis shows that a larger DP parameter usually translates into a smaller PMP parameter, thus providing strong privacy protection against actual MIAs. 3. **Providing Practical Guidance**: - Based on the research results, the paper provides a theoretical basis for practitioners to choose DP parameters, helping them better balance privacy and practicality in practical applications. ### Conclusion By introducing the PMP concept, the paper explains why a large value of $\varepsilon$ can effectively resist MIAs in practical applications. This finding not only bridges the gap between theory and practice but also provides new directions and tools for future privacy protection research. ### Formula Summary - **Differential Privacy**: \[ \text{For all adjacent datasets } D, D' \in X^n \text{ and all measurable subsets } S\subseteq Z: \] \[ P(A(D)\in S)\leq e^\varepsilon P(A(D')\in S)+\delta \] - **Practical Membership Privacy (Practic)

Why Does Differential Privacy with Large Epsilon Defend Against Practical Membership Inference Attacks?

DPMLBench: Holistic Evaluation of Differentially Private Machine Learning

Investigating Membership Inference Attacks under Data Dependencies

Differential Privacy Protection Against Membership Inference Attack on Machine Learning for Genomic Data

One Parameter Defense -- Defending against Data Inference Attacks via Differential Privacy

Evaluating Differentially Private Machine Learning in Practice

Analyzing Privacy Leakage in Machine Learning via Multiple Hypothesis Testing: A Lesson From Fano

Closed-Form Bounds for DP-SGD against Record-level Inference

Not one but many Tradeoffs: Privacy Vs. Utility in Differentially Private Machine Learning

Generalization Techniques Empirically Outperform Differential Privacy against Membership Inference

Privacy for All: Demystify Vulnerability Disparity of Differential Privacy against Membership Inference Attack

How to DP-fy ML: A Practical Guide to Machine Learning with Differential Privacy

What Are the Chances? Explaining the Epsilon Parameter in Differential Privacy

ATTAXONOMY: Unpacking Differential Privacy Guarantees Against Practical Adversaries

Discriminative Adversarial Privacy: Balancing Accuracy and Membership Privacy in Neural Networks

Making Differential Privacy Easier to Use for Data Controllers using a Privacy Risk Indicator

Deciphering the Interplay between Local Differential Privacy, Average Bayesian Privacy, and Maximum Bayesian Privacy

Differentially Private and Adversarially Robust Machine Learning: An Empirical Evaluation

Membership Privacy for Machine Learning Models Through Knowledge Transfer

Privacy accounting $\varepsilon$conomics: Improving differential privacy composition via a posteriori bounds

Differential Privacy for Class-based Data: A Practical Gaussian Mechanism