Approximation of Pufferfish Privacy for Gaussian Priors

Ni Ding
2024-05-07
Abstract:This paper studies how to approximate pufferfish privacy when the adversary's prior belief of the published data is Gaussian distributed. Using Monge's optimal transport plan, we show that $(\epsilon, \delta)$-pufferfish privacy is attained if the additive Laplace noise is calibrated to the differences in mean and variance of the Gaussian distributions conditioned on every discriminative secret pair. A typical application is the private release of the summation (or average) query, for which sufficient conditions are derived for approximating $\epsilon$-statistical indistinguishability in individual's sensitive data. The result is then extended to arbitrary prior beliefs trained by Gaussian mixture models (GMMs): calibrating Laplace noise to a convex combination of differences in mean and variance between Gaussian components attains $(\epsilon,\delta)$-pufferfish privacy.
Information Theory,Cryptography and Security
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is how to achieve pufferfish privacy protection for Gaussian prior distributions by adding appropriate Laplace noise during the data release process. Specifically, when the attacker's prior belief is a Gaussian distribution, the paper studies how to calibrate Laplace noise to achieve \((\epsilon, \delta)\)-pufferfish privacy protection. The main contributions include: 1. **For Gaussian - distributed data with all secret instances given**: Using Monge's optimal transport plan, it is proved that \((\epsilon, \delta)\)-pufferfish privacy protection can be achieved by adding Laplace noise. The scale parameter \(b\) of the Laplace noise should be calibrated according to the mean and variance differences of each secret pair \((s_i, s_j)\). 2. **Privatization of sum queries in multi - user systems**: Applying the above results, sufficient conditions for privatizing sum queries in multi - user systems are derived to ensure that each participant's data is statistically indistinguishable. 3. **GMM model for arbitrary prior distributions**: Assuming that the attacker has learned prior knowledge of an arbitrary distribution through the Gaussian Mixture Model (GMM), it is proved that \((\epsilon, \delta)\)-pufferfish privacy protection can be achieved by calibrating the scale parameter \(b\) of the Laplace noise to the convex combination of the Gaussian component means and variance differences. ### Formula Summary - **Calibration of the scale parameter of Laplace noise**: \[ b \geq \frac{1}{\epsilon} \max_{\rho, (s_i, s_j) \in S} \left( | \mu_i - \mu_j | + \tau^*(\delta) | \sigma_i - \sigma_j | \right) \] where \(\tau^*(\delta) = \min \{ \tau : \Pr(Z > \tau) \leq \frac{\delta}{2} \}\) or \(\tau^*(\delta) = Q^{-1}(\frac{\delta}{2})\), and \(Q(t)\) is the tail probability of the standard normal distribution. - **Sum queries in multi - user systems**: \[ b \geq \frac{1}{\epsilon} \max_{k \in K} \left( | \mu_k | + \Delta \sigma_k \tau^*(\delta) \right) \] where \(\Delta \sigma_k = \sqrt{\sum_{k' \in K - k} \sigma_{k'}^2 + \sigma_k^2} - \sqrt{\sum_{k' \in K - k} \sigma_{k'}^2}\). - **Noise calibration under GMM prior**: \[ b \geq \frac{1}{\epsilon} \max_{\rho, (s_i, s_j) \in S} \sum_{m, l} w_{ml}^* \left( | \mu_{im} - \mu_{jl} | + \tau^*(\delta) | \sigma_{im} - \sigma_{jl} | \right) \] ### Experimental Verification The paper conducted experiments on the Adult and Hungarian Heart Disease datasets in the UCI Machine Learning Repository to verify the effectiveness of the proposed method. The experimental results show that by appropriately calibrating Laplace noise, the required privacy protection level can be achieved while maintaining data utility.