Abstract:Differential privacy has become the dominant standard in the research community for strong privacy protection. There has been a flood of research into query answering algorithms that meet this standard. Algorithms are becoming increasingly complex, and in particular, the performance of many emerging algorithms is {\em data dependent}, meaning the distribution of the noise added to query answers may change depending on the input data. Theoretical analysis typically only considers the worst case, making empirical study of average case performance increasingly important. In this paper we propose a set of evaluation principles which we argue are essential for sound evaluation. Based on these principles we propose DPBench, a novel evaluation framework for standardized evaluation of privacy algorithms. We then apply our benchmark to evaluate algorithms for answering 1- and 2-dimensional range queries. The result is a thorough empirical study of 15 published algorithms on a total of 27 datasets that offers new insights into algorithm behavior---in particular the influence of dataset scale and shape---and a more complete characterization of the state of the art. Our methodology is able to resolve inconsistencies in prior empirical studies and place algorithm performance in context through comparison to simple baselines. Finally, we pose open research questions which we hope will guide future algorithm design.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the deficiencies in the current methods for evaluating differential privacy algorithms, especially in empirical research. Specifically: 1. **Gaps and Inconsistencies in Empirical Evaluation**: As algorithms become more and more complex, their error rates are more difficult to determine through theoretical analysis, so good empirical evaluation becomes more important. However, due to chronological order, lack of benchmark datasets, and space limitations in publications, the existing empirical evaluations do not comprehensively evaluate all existing algorithms, resulting in gaps or even inconsistencies in the understanding of algorithm performance. 2. **Understanding Data Dependence**: Many newly proposed algorithms are data - dependent, that is, their errors are sensitive to the characteristics of the input data. This makes an algorithm that performs well on one dataset may perform poorly on another dataset. For data owners, how to predict the performance of these algorithms on new datasets or experimental settings is a challenge. 3. **Selecting the Values of Free Parameters**: Algorithms are usually associated with some free parameters (in addition to the privacy parameter \(\epsilon\)), but the impact of these parameters on the error has not been fully quantified. Published research provides little guidance for data owners on how to set these parameters, or the default values are not optimal. 4. **Unreasonable Utility Comparisons**: Even if the empirical analysis is comprehensive, the results may be useless to practitioners. For example, the error of a differential privacy algorithm is a random variable, but most empirical analyses only report the average error and ignore the variation of the error. In addition, the algorithm error is often not compared with simple baselines (such as the Laplace mechanism), which are the algorithms that practitioners will first try to apply. To solve these problems, this paper proposes a set of evaluation principles and develops a new evaluation framework named DPBench for standardizing the evaluation of differential privacy algorithms. Through this framework, the authors conduct a comprehensive empirical study on 15 published algorithms, covering 27 datasets, aiming to provide new insights into algorithm behavior, especially the impact of data size and shape on algorithm performance, and provide better guidance for practitioners to select algorithms.

Principled Evaluation of Differentially Private Algorithms using DPBench

DPMLBench: Holistic Evaluation of Differentially Private Machine Learning

Differentially Private SQL with Bounded User Contribution

Not one but many Tradeoffs: Privacy Vs. Utility in Differentially Private Machine Learning

Algorithms with More Granular Differential Privacy Guarantees

DP-Auditorium: a Large Scale Library for Auditing Differential Privacy

Efficient Batch Query Answering Under Differential Privacy

Privacy Profiles for Private Selection

Privacy accounting $\varepsilon$conomics: Improving differential privacy composition via a posteriori bounds

Optimum Noise Mechanism for Differentially Private Queries in Discrete Finite Sets

DProvDB: Differentially Private Query Processing with Multi-Analyst Provenance

How Private are DP-SGD Implementations?

Do I Get the Privacy I Need? Benchmarking Utility in Differential Privacy Libraries

Benchmarking Secure Sampling Protocols for Differential Privacy

Beyond the Calibration Point: Mechanism Comparison in Differential Privacy

Budget Recycling Differential Privacy

Differentially Private Algorithms for Empirical Machine Learning

Wasserstein Differential Privacy

Evaluating Differentially Private Machine Learning in Practice

Advancing Differential Privacy: Where We Are Now and Future Directions for Real-World Deployment

A Programming Framework for Differential Privacy with Accuracy Concentration Bounds