Locally differentially private high-dimensional data synthesis
Xue Chen,Cheng Wang,Qing Yang,Teng Hu,Changjun Jiang
DOI: https://doi.org/10.1007/s11432-022-3583-x
2023-01-05
Science China Information Sciences
Abstract:In local differential privacy (LDP), a challenging problem is the ability to generate high-dimensional data while efficiently capturing the correlation between attributes in a dataset. Existing solutions for low-dimensional data synthesis, which partition the privacy budget among all attributes, cease to be effective in high-dimensional scenarios due to the large-scale noise and communication cost caused by the high dimension. In fact, the high-dimensional characteristics not only bring challenges but also make it possible to apply some technologies to break this bottleneck. This paper presents SamPrivSyn for high-dimensional data synthesis under LDP, which is composed of a marginal sampling module and a data generation module. The marginal sampling module is used to sample from the original data to obtain two-way marginals. The sampling process is based on mutual information, which is updated iteratively to retain, as much as possible, the correlation between attributes. The data generation module is used to reconstruct the synthetic dataset from the sampled two-way marginals. Furthermore, this study conducted comparison experiments on the real-world datasets to demonstrate the effectiveness and efficiency of the proposed method, with results proving that SamPrivSyn can not only protect privacy but also retain the correlation information between the attributes.
computer science, information systems,engineering, electrical & electronic