Abstract:An ability to share data, even in aggregated form, is critical to advancing both conventional and data science. However, insofar as such datasets are comprised of individuals, their membership in these datasets is often viewed as sensitive, with membership inference attacks (MIAs) threatening to violate their privacy. We propose a Bayesian game model for privacy-preserving publishing of data-sharing mechanism outputs (for example, summary statistics for sharing genomic data). In this game, the defender minimizes a combination of expected utility and privacy loss, with the latter being maximized by a Bayes-rational attacker. We propose a GAN-style algorithm to approximate a Bayes-Nash equilibrium of this game, and introduce the notions of Bayes-Nash generative privacy (BNGP) and Bayes generative privacy (BGP) risk that aims to optimally balance the defender's privacy and utility in a way that is robust to the attacker's heterogeneous preferences with respect to true and false positives. We demonstrate the properties of composition and post-processing for BGP risk and establish conditions under which BNGP and pure differential privacy (PDP) are equivalent. We apply our method to sharing summary statistics, where MIAs can re-identify individuals even from aggregated data. Theoretical analysis and empirical results demonstrate that our Bayesian game-theoretic method outperforms state-of-the-art approaches for privacy-preserving sharing of summary statistics.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to protect the privacy of individuals when sharing data (such as aggregated statistical data) and prevent Membership Inference Attacks (MIAs). Specifically, the paper proposes a method based on the Bayesian game model to balance the trade - off between privacy protection and data utility. ### Problem Background Membership Inference Attacks (MIAs) are a type of attack that takes advantage of vulnerabilities in data analysis and machine learning to determine whether an individual's data is included in a certain dataset (such as a training dataset). This type of attack poses a serious privacy risk in sensitive areas (such as medical, biometric, location - based services, social media, and finance). However, MIAs can also be used as an auditing tool to assess privacy risks. ### Core Problem of the Paper To address the privacy risks posed by MIAs, existing strategies include noise perturbation and Differential Privacy (DP). These methods reduce information leakage and enhance privacy protection by introducing randomness. However, increasing uncertainty will inevitably damage data utility. Therefore, how to maximize data utility while protecting privacy has become a key issue. ### Solution The paper proposes a method based on the Bayesian game model, which models the privacy - utility trade - off as a game between the defender and the attacker. Specifically: 1. **Bayesian Game Model**: - **Defender**: The goal is to minimize the expected privacy loss while maintaining the required data utility. - **Attacker**: The goal is to maximize the success rate of membership inference based on its subjective beliefs and preferences. 2. **Generative Adversarial Network (GAN) Algorithm**: - A GAN - like algorithm is proposed to approximate the Bayes - Nash equilibrium. The defender's strategy is represented by a neural network generator, which takes the real membership vector and an auxiliary random vector as input and generates a noise vector. The attacker's strategy is represented by a neural network discriminator, which processes the perturbed output and attempts to infer membership information. 3. **Bayesian - Generated Privacy Risk (BGP Risk)**: - The concept of Bayesian - Generated Privacy Risk (BGP risk) is introduced, aiming to optimally balance the defender's privacy and utility and be robust to the attacker's heterogeneous preferences for true positives and false positives. 4. **Theoretical Analysis and Empirical Results**: - Theoretical analysis shows that the proposed Bayesian game - theoretic method is superior to the existing state - of - the - art methods in terms of privacy protection. Empirical experiments also verify this, especially in the case of sharing genomic aggregated statistical data, where individuals can be re - identified even from aggregated data. ### Formula Examples - **Membership Advantage**: \[ \text{Adv}_k(A)=\Pr[A(d_k, x) = 1\mid b_k = 1]-\Pr[A(d_k, x) = 1\mid b_k = 0] \] - **Bayes - weighted Membership Advantage**: \[ \text{Adv}(h_A,\sigma,\theta,\gamma;g_D)=(1 - \gamma)\sum_{k\in U, b_{-k}}\Pr[A(d_k, x; h_A,\sigma)=1\mid b_k = 1; g_D]\theta(b_k = 1, b_{-k})-\gamma\sum_{k\in U, b_{-k}}\Pr[A(d_k, x; h_A,\sigma)=1\mid b_k = 0; g_D]\theta(b_k = 0, b_{-k}) \] Through this method, the paper provides an effective and robust way to protect privacy while ensuring the maximization of data utility.

Bayes-Nash Generative Privacy Protection Against Membership Inference Attacks

A Game-Theoretic Approach to Privacy-Utility Tradeoff in Sharing Genomic Summary Statistics

GANs Based Density Distribution Privacy-Preservation on Mobility Data

Generalization in Generative Adversarial Networks: A Novel Perspective from Privacy Protection.

PKDGAN: Private Knowledge Distillation with Generative Adversarial Networks

Characterizing Membership Privacy in Stochastic Gradient Langevin Dynamics.

Preserving Privacy in GANs Against Membership Inference Attack

Efficiency Improvement of Homomorphic E-Auction

Context-Aware Generative Adversarial Privacy

PPGAN: Privacy-preserving Generative Adversarial Network

Using game theory to thwart multistage privacy intrusions when sharing data

Tunable Privacy Risk Evaluation of Generative Adversarial Networks

On Utility and Privacy in Synthetic Genomic Data

Reproducibility-Oriented and Privacy-Preserving Genomic Dataset Sharing

Privacy Vulnerabilities in Marginals-based Synthetic Data

Privacy-preserving Generative Framework Against Membership Inference Attacks

Data Augmentation MCMC for Bayesian Inference from Privatized Data

Near-Optimal Privacy-Utility Tradeoff in Genomic Studies Using Selective SNP Hiding

GAN Driven Personalized Spatial Temporal Private Data Sharing in Cyber Physical Social Systems

Differentially Private Data Generative Models

Exact and Efficient Bayesian Inference for Privacy Risk Quantification (Extended Version)