Differentially private anonymized histograms

Ananda Theertha Suresh
DOI: https://doi.org/10.48550/arXiv.1910.03553
2020-01-14
Abstract:For a dataset of label-count pairs, an anonymized histogram is the multiset of counts. Anonymized histograms appear in various potentially sensitive contexts such as password-frequency lists, degree distribution in social networks, and estimation of symmetric properties of discrete distributions. Motivated by these applications, we propose the first differentially private mechanism to release anonymized histograms that achieves near-optimal privacy utility trade-off both in terms of number of items and the privacy parameter. Further, if the underlying histogram is given in a compact format, the proposed algorithm runs in time sub-linear in the number of items. For anonymized histograms generated from unknown discrete distributions, we show that the released histogram can be directly used for estimating symmetric properties of the underlying distribution.
Machine Learning,Cryptography and Security,Data Structures and Algorithms,Information Theory
What problem does this paper attempt to address?