Differentially Private Hypothesis Testing, Revisited.

Yue Wang,Jaewoo Lee,Daniel Kifer
2015-01-01
Abstract:Hypothesis testing is different from traditional applications of differential privacy in that one needs an accurate estimate of how the noise affects the result (i.e. a $p$-value). Previous approaches to differentially private hypothesis testing either used output perturbation techniques that generally had large sensitivities (hence risked swamping the data with noise), or input perturbation techniques that resulted in highly unreliable $p$-values (and hence invalid statistical conclusions). In this paper, we develop a variety of practical hypothesis tests that address these problems. Using a different asymptotic regime that is more suited to hypothesis testing with privacy, we show a modified equivalence between chi-squared tests and likelihood ratio tests. We then develop differentially private likelihood ratio and chi-squared tests for a variety of applications on tabular data (i.e., independence, homogeneity, and goodness-of-fit tests). An open problem is whether new test statistics specialized to differential privacy could lead to further improvements. To aid in this search, we further propose a permutation-based testbed that can allow experimenters to empirically estimate the behavior of new test statistics for private hypothesis testing before fully working out their mathematical details (such as approximate null distributions). Experimental evaluations on small and large datasets using a wide variety of privacy settings demonstrate the practicality and reliability of our methods.
What problem does this paper attempt to address?