Statistical Quantification of Differential Privacy: A Local Approach

Önder Askin,Tim Kutta,Holger Dette
DOI: https://doi.org/10.48550/arXiv.2108.09528
2022-05-02
Abstract:In this work, we introduce a new approach for statistical quantification of differential privacy in a black box setting. We present estimators and confidence intervals for the optimal privacy parameter of a randomized algorithm $A$, as well as other key variables (such as the "data-centric privacy level"). Our estimators are based on a local characterization of privacy and in contrast to the related literature avoid the process of "event selection" - a major obstacle to privacy validation. This makes our methods easy to implement and user-friendly. We show fast convergence rates of the estimators and asymptotic validity of the confidence intervals. An experimental study of various algorithms confirms the efficacy of our approach.
Cryptography and Security,Statistics Theory,Methodology
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to develop a brand - new statistical method to quantify the differential privacy (DP) of randomized algorithms in a black - box environment. Specifically, the paper aims to: 1. **Provide an evaluation method without knowledge of internal structure**: Many existing methods for verifying differential privacy rely on an understanding of the internal structure of the algorithm, which is often not feasible in practical applications. This paper proposes a black - box evaluation method based solely on output samples of the algorithm, thus avoiding the need for source code or design details. 2. **Avoid the major obstacle of "event selection"**: Traditional methods for verifying differential privacy usually need to select specific events to test privacy conditions, which is very complex and difficult to achieve in practice. This paper proposes a method based on local loss functions to directly estimate the optimal privacy parameter \(\epsilon\), thus bypassing the problem of event selection. 3. **Fast convergence and effective confidence intervals**: This paper not only proposes new estimators but also demonstrates the fast convergence of these estimators and provides asymptotically effective confidence intervals, allowing users to have greater confidence in the privacy level of the algorithm. ### Specific Problem Description The paper mainly solves the following specific problems: - **How to evaluate the differential privacy level of an algorithm without knowing its internal structure**: By introducing estimators based on local loss functions, the paper provides a black - box evaluation method that can verify the privacy characteristics of an algorithm without accessing its source code. - **How to avoid the complex event - selection process**: Traditional methods need to traverse all possible events to verify privacy conditions, which is not feasible in practice. The method proposed in this paper simplifies this process by directly estimating the maximum value of the local loss function. - **How to provide effective statistical inference tools for privacy parameters**: The paper not only provides estimators but also develops the Maximum Privacy Loss (MPL) algorithm to generate one - sided confidence intervals, ensuring that the estimation results are statistically reliable. ### Key Formulas The key formulas involved in the paper include: - Definition of differential privacy: \[ P(A(x) \in E) \leq e^\epsilon P(A(x') \in E) \] where \(x\) and \(x'\) are adjacent databases, and \(E\) is any measurable event. - Data - specific privacy violation amount: \[ \epsilon_{x,x'} := \sup_E L_{x,x'}(E) \] where \(L_{x,x'}(E)=\left|\ln\left(\frac{P(A(x) \in E)}{P(A(x') \in E)}\right)\right|\). - Local loss function: \[ \ell_{x,x'}(t):=\left|\ln(f_x(t)) - \ln(f_{x'}(t))\right| \] where \(f_x(t)\) and \(f_{x'}(t)\) are the output densities of the algorithm on databases \(x\) and \(x'\) respectively. Through these formulas, the paper successfully transforms the evaluation of differential privacy into a more tractable statistical problem, thus providing strong support for privacy verification in practical applications.