Privacy without noise

Yitao Duan
DOI: https://doi.org/10.1145/1645953.1646160
2009-01-01
Abstract:This paper presents several results on statistical database privacy. We first point out a serious vulnerability in a widely-accepted approach which perturbs query results with additive noise. We then show that for sum queries which aggregate across all records, when the dataset is sufficiently large, the inherent uncertainty associated with unknown quantities is enough to provide similar perturbation and the same privacy can be obtained without external noise. Sum query is a surprisingly general primitive supporting a large number of data mining algorithms such as SVD, PCA, k-means, ID3, SVM, EM, and all the algorithms in the statistical query model. We derive privacy conditions for sum queries and provide the first mathematical proof for the intuition that aggregates across a large number of individuals is private using a widely accepted notion of privacy. We also show how the results can be used to construct simulatable query auditing algorithms with stronger privacy.
What problem does this paper attempt to address?