Unleash the Power of Ellipsis: Accuracy-enhanced Sparse Vector Technique with Exponential Noise

Yuhan Liu,Sheng Wang,Yixuan Liu,Feifei Li,Hong Chen
2024-07-29
Abstract:The Sparse Vector Technique (SVT) is one of the most fundamental tools in differential privacy (DP). It works as a backbone for adaptive data analysis by answering a sequence of queries on a given dataset, and gleaning useful information in a privacy-preserving manner. Unlike the typical private query releases that directly publicize the noisy query results, SVT is less informative -- it keeps the noisy query results to itself and only reveals a binary bit for each query, indicating whether the query result surpasses a predefined threshold. To provide a rigorous DP guarantee for SVT, prior works in the literature adopt a conservative privacy analysis by assuming the direct disclosure of noisy query results as in typical private query releases. This approach, however, hinders SVT from achieving higher query accuracy due to an overestimation of the privacy risks, which further leads to an excessive noise injection using the Laplacian or Gaussian noise for perturbation. Motivated by this, we provide a new privacy analysis for SVT by considering its less informative nature. Our analysis results not only broaden the range of applicable noise types for perturbation in SVT, but also identify the exponential noise as optimal among all evaluated noises (which, however, is usually deemed non-applicable in prior works). The main challenge in applying exponential noise to SVT is mitigating the sub-optimal performance due to the bias introduced by noise distributions. To address this, we develop a utility-oriented optimal threshold correction method and an appending strategy, which enhances the performance of SVT by increasing the precision and recall, respectively. The effectiveness of our proposed methods is substantiated both theoretically and empirically, demonstrating significant improvements up to $50\%$ across evaluated metrics.
Cryptography and Security,Artificial Intelligence
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of low query accuracy in Sparse Vector Technique (SVT) in Differential Privacy (DP). Specifically, the existing SVT methods over - estimate the privacy risk due to the use of conservative privacy analysis methods, and thus inject too much noise, which reduces the query accuracy. #### Main problems: 1. **Conservative privacy analysis**: Traditional methods assume that SVT directly discloses noisy query results, which actually overestimates the privacy risk, leading to the use of noise with large variances (such as Laplace or Gaussian noise), and thus affecting the query accuracy. 2. **Limited noise selection**: The existing privacy analysis methods limit the types of noise that can be used for SVT, especially excluding some potentially better noise distributions, such as exponential noise. To solve these problems, the author proposes a new privacy analysis method that can better capture the characteristics of SVT and allows the use of a wider range of noise types, especially exponential noise. In addition, the author also develops two strategies to further improve the performance of SVT: - **Optimal threshold correction method**: By optimizing the threshold correction term, the bias introduced by exponential noise is reduced, thereby improving the query precision. - **Appending strategy**: Re - add the noisy negative query to the query queue for an additional round of query to increase the recall rate. These improvements enable SVT to significantly improve query accuracy while maintaining strict privacy protection. ### Formula summary 1. **Definition of differential privacy**: \[ \text{For any two adjacent data sets } D \text{ and } D' \text{, and any output } o\subseteq O: \] \[ \Pr[M(D)\in o]\leq e^{\varepsilon}\cdot\Pr[M(D')\in o]+\delta \] 2. **The noise cumulative distribution function satisfies the Lipschitz condition**: \[ |\ln(f_1(x)) - \ln(f_1(x + b_1))|\leq k_1|b_1| \] \[ |\ln(1 - F_2(x)) - \ln(1 - F_2(x + b_2))|\leq k_2|b_2| \] where \( f_1(\cdot) \) and \( F_2(\cdot) \) are the probability density function and the cumulative distribution function of the noise distribution \( N_1 \) and \( N_2 \), respectively. 3. **\((\alpha,\beta)\)-accuracy**: \[ \text{For all } a_i = \top: q_i(D)\geq T-\alpha \] \[ \text{For all } a_i = \bot: q_i(D)\leq T+\alpha \] Through these improvements and formulas, the paper shows how to significantly improve the query accuracy of SVT while ensuring privacy.