Privacy-Preserving Correlated Data Publication: Privacy Analysis and Optimal Noise Design
Mingjing Sun,Chengcheng Zhao,Jianping He,Peng Cheng,Daniel E. Quevedo
DOI: https://doi.org/10.1109/tnse.2020.3044590
IF: 6.6
2020-01-01
IEEE Transactions on Network Science and Engineering
Abstract:The privacy issue in data publication is critical and has been extensively studied. Correlation is unavoidable in data publication, which universally manifests intrinsic correlations owing to social, physical, behavioral, and genetic relationships. However, most of the existing works assume that private data is independent, i.e., the correlation among data is neglected. In this paper, we investigate the privacy concern of data publication where deterministic and probabilistic correlations are considered, respectively. Specifically, $(\varepsilon, \delta)$-multi-dimensional data-privacy (MDDP) is proposed to quantify the correlated data privacy. It characterizes the disclosure probability of the published data being jointly estimated with the correlation under a given accuracy. Then, we explore the effects of deterministic and probabilistic correlations on privacy disclosure, respectively. For both kinds of correlations, it is shown that the privacy disclosure with correlations increases compared to the one without correlation knowledge. Meanwhile, a closed-form expression of disclosure probability and a strict bound of privacy disclosure gain are derived, respectively. To minimize the disclosure probability, we provide the optimal noise distribution in the sense of $(\varepsilon, \delta)$-MDDP. Extensive simulations on a real dataset verify our analytical results.