Abstract:We present a data analytics system that ensures accurate counts can be released with differential privacy and minimal onboarding effort while showing instances that outperform other approaches that require more onboarding effort. The primary difference between our proposal and existing approaches is that it does not rely on user contribution bounds over distinct elements, i.e. $\ell_0$-sensitivity bounds, which can significantly bias counts. Contribution bounds for $\ell_0$-sensitivity have been considered as necessary to ensure differential privacy, but we show that this is actually not necessary and can lead to releasing more results that are more accurate. We require minimal hyperparameter tuning and demonstrate results on several publicly available dataset. We hope that this approach will help differential privacy scale to many different data analytics applications.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to publish the count - statistical results in the data set efficiently and accurately while ensuring data privacy. Specifically, the author proposes a new method that can achieve count publication with differential privacy guarantees without relying on user contribution boundaries (i.e., $\ell_0$-sensitivity boundaries). Traditional methods usually require strict boundary restrictions on user contributions, which may introduce biases and affect the accuracy of counts. In addition, these methods often require a large amount of parameter adjustment and expert knowledge, and are difficult to automate and be widely applied to different data analysis scenarios. ### Main contributions of the paper 1. **Avoiding user contribution boundaries**: - Traditional methods usually rely on $\ell_0$-sensitivity boundaries to ensure differential privacy, but this method may significantly bias the count results. The method proposed in this paper does not require such boundaries, thereby reducing biases and improving the accuracy of counts. 2. **Simplifying parameter adjustment**: - The new method hardly requires manual adjustment of hyper - parameters; only the total privacy budget needs to be set. This makes the method easier to automate and applicable to a variety of different data analysis tasks. 3. **Efficient count publication**: - By using the Unknown Domain Gumbel mechanism, this method can iteratively find the elements with the highest count and add noise to them to ensure differential privacy. This process can be completed without accessing the original data, thus protecting user privacy. 4. **Wide applicability**: - This method has been verified on multiple public data sets, including financial, Reddit comment, Wikipedia, and MovieLens data sets, proving its effectiveness and robustness on data of different scales and types. ### Formula explanation - Definition of differential privacy: \[ \text{Algorithm } A: X \to Y \text{ is } (\epsilon, \delta)\text{-differential privacy if for any measurable set } S \subseteq Y \text{ and any adjacent inputs } x \sim x', \] \[ \Pr[A(x) \in S] \leq e^\epsilon \Pr[A(x') \in S] + \delta. \] - $\ell_p$-sensitivity: \[ \Delta_p(f) = \max_{x \sim x'} \left\| f(x) - f(x') \right\|_p. \] - Gaussian mechanism: \[ \text{ } M(x) = f(x) + (Z_1, \cdots, Z_d), \quad Z_i \sim N(0, \frac{\Delta_2(f)^2}{2\rho}). \] ### Conclusion This paper proposes a new differential - privacy count - publication method that can provide more accurate count results without relying on user contribution boundaries and hardly requires manual adjustment of hyper - parameters. This method is applicable to a variety of different types of data sets and has broad application prospects, especially in scenarios where privacy needs to be protected.

Private Count Release: A Simple and Scalable Approach for Private Data Analytics

Constrained Differential Privacy for Count Data

Slowly Scaling Per-Record Differential Privacy

Gradual Release of Sensitive Data under Differential Privacy

Almost Tight Error Bounds on Differentially Private Continual Counting

Bolt-on Differential Privacy for Scalable Stochastic Gradient Descent-based Analytics

On Learning Cluster Coefficient of Private Networks

Private Graph Data Release: A Survey

Private measures, random walks, and synthetic data

Differential Privacy for the Analyst via Private Equilibrium Computation

Counting Distinct Elements Under Person-Level Differential Privacy

Private Counting of Distinct Elements in the Turnstile Model and Extensions

Privacy Profiles for Private Selection

Differentially Private Synthetic Data with Private Density Estimation

Differential Privacy on Dynamic Data

A New Analysis of Differential Privacy's Generalization Guarantees

Privacy accounting $\varepsilon$conomics: Improving differential privacy composition via a posteriori bounds

Privately Answering Queries on Skewed Data via Per Record Differential Privacy

A Unifying Privacy Analysis Framework for Unknown Domain Algorithms in Differential Privacy

Differentially Private Spatial Decompositions

Privately Answering Queries on Skewed Data via Per-Record Differential Privacy