Principled Evaluation of Differentially Private Algorithms using DPBench

Michael Hay,Ashwin Machanavajjhala,Gerome Miklau,Yan Chen,Dan Zhang
DOI: https://doi.org/10.48550/arXiv.1512.04817
2015-12-15
Abstract:Differential privacy has become the dominant standard in the research community for strong privacy protection. There has been a flood of research into query answering algorithms that meet this standard. Algorithms are becoming increasingly complex, and in particular, the performance of many emerging algorithms is {\em data dependent}, meaning the distribution of the noise added to query answers may change depending on the input data. Theoretical analysis typically only considers the worst case, making empirical study of average case performance increasingly important. In this paper we propose a set of evaluation principles which we argue are essential for sound evaluation. Based on these principles we propose DPBench, a novel evaluation framework for standardized evaluation of privacy algorithms. We then apply our benchmark to evaluate algorithms for answering 1- and 2-dimensional range queries. The result is a thorough empirical study of 15 published algorithms on a total of 27 datasets that offers new insights into algorithm behavior---in particular the influence of dataset scale and shape---and a more complete characterization of the state of the art. Our methodology is able to resolve inconsistencies in prior empirical studies and place algorithm performance in context through comparison to simple baselines. Finally, we pose open research questions which we hope will guide future algorithm design.
Databases,Cryptography and Security
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the deficiencies in the current methods for evaluating differential privacy algorithms, especially in empirical research. Specifically: 1. **Gaps and Inconsistencies in Empirical Evaluation**: As algorithms become more and more complex, their error rates are more difficult to determine through theoretical analysis, so good empirical evaluation becomes more important. However, due to chronological order, lack of benchmark datasets, and space limitations in publications, the existing empirical evaluations do not comprehensively evaluate all existing algorithms, resulting in gaps or even inconsistencies in the understanding of algorithm performance. 2. **Understanding Data Dependence**: Many newly proposed algorithms are data - dependent, that is, their errors are sensitive to the characteristics of the input data. This makes an algorithm that performs well on one dataset may perform poorly on another dataset. For data owners, how to predict the performance of these algorithms on new datasets or experimental settings is a challenge. 3. **Selecting the Values of Free Parameters**: Algorithms are usually associated with some free parameters (in addition to the privacy parameter \(\epsilon\)), but the impact of these parameters on the error has not been fully quantified. Published research provides little guidance for data owners on how to set these parameters, or the default values are not optimal. 4. **Unreasonable Utility Comparisons**: Even if the empirical analysis is comprehensive, the results may be useless to practitioners. For example, the error of a differential privacy algorithm is a random variable, but most empirical analyses only report the average error and ignore the variation of the error. In addition, the algorithm error is often not compared with simple baselines (such as the Laplace mechanism), which are the algorithms that practitioners will first try to apply. To solve these problems, this paper proposes a set of evaluation principles and develops a new evaluation framework named DPBench for standardizing the evaluation of differential privacy algorithms. Through this framework, the authors conduct a comprehensive empirical study on 15 published algorithms, covering 27 datasets, aiming to provide new insights into algorithm behavior, especially the impact of data size and shape on algorithm performance, and provide better guidance for practitioners to select algorithms.