Differentially private data release for data mining

Noman Mohammed,Rui Chen,Benjamin C.M. Fung,Philip S. Yu
DOI: https://doi.org/10.1145/2020408.2020487
2011-01-01
Abstract:Privacy-preserving data publishing addresses the problem of disclosing sensitive data when mining for useful information. Among the existing privacy models, ∈-differential privacy provides one of the strongest privacy guarantees and has no assumptions about an adversary's background knowledge. Most of the existing solutions that ensure ∈-differential privacy are based on an interactive model, where the data miner is only allowed to pose aggregate queries to the database. In this paper, we propose the first anonymization algorithm for the non-interactive setting based on the generalization technique. The proposed solution first probabilistically generalizes the raw data and then adds noise to guarantee ∈-differential privacy. As a sample application, we show that the anonymized data can be used effectively to build a decision tree induction classifier. Experimental results demonstrate that the proposed non-interactive anonymization algorithm is scalable and performs better than the existing solutions for classification analysis.
What problem does this paper attempt to address?