Multi-Party High-Dimensional Data Publishing Under Differential Privacy
Xiang Cheng,Peng Tang,Sen Su,Rui Chen,Zequn Wu,Binyuan Zhu
DOI: https://doi.org/10.1109/tkde.2019.2906610
IF: 9.235
2020-08-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:In this paper, we study the problem of publishing high-dimensional data in a distributed multi-party environment under differential privacy. In particular, with the assistance of a semi-trusted curator, the parties (i.e., local data owners) collectively generate a synthetic integrated dataset while satisfying $\varepsilon$ɛ-differential privacy. To solve this problem, we present a differentially private sequential update of Bayesian network (DP-SUBN) approach. In DP-SUBN, the parties and the curator collaboratively identify the Bayesian network $\mathbb {N}$N that best fits the integrated dataset in a sequential manner, from which a synthetic dataset can then be generated. The fundamental advantage of adopting the sequential update manner is that the parties can treat the intermediate results provided by previous parties as their prior knowledge to direct how to learn $\mathbb {N}$N. The core of DP-SUBN is the construction of the search frontier, which can be seen as a priori knowledge to guide the parties to update $\mathbb {N}$N. By exploiting the correlations of attribute pairs, we propose exact and heuristic methods to construct the search frontier. In particular, to privately quantify the correlations of attribute pairs without introducing too much noise, we first put forward a non-overlapping covering design (NOCD) method, and then devise a dynamic programming method for determining the optimal parameters used in NOCD. Through privacy analysis, we show that DP-SUBN satisfies $\varepsilon$ɛ-differential privacy. Extensive experiments on real datasets demonstrate that DP-SUBN offers desirable data utility with low communication cost.
computer science, information systems, artificial intelligence,engineering, electrical & electronic