Locally Private High-Dimensional Crowdsourced Data Release Based on Copula Functions

Teng Wang,Xinyu Yang,Xuebin Ren,Wei Yu,Shusen Yang
DOI: https://doi.org/10.1109/TSC.2019.2961092
IF: 11.019
2022-01-01
IEEE Transactions on Services Computing
Abstract:With the increasing popularity of crowdsourcing services, high-dimensional crowdsourced data provides a wealth of knowledge. Nonetheless, unprecedented privacy threats to participants have emerged, due to complex correlations among multiple attributes and the vulnerabilities of untrusted crowdsourcing servers. Differential privacy-based paradigms have been proposed to release privacy-preserving datasets with statistical approximation. Nonetheless, most existing schemes are limited when facing highly correlated attributes, and cannot prevent privacy threats from untrusted crowdsourcing servers. To address this issue, we propose two novel solutions, namely LoCop and DR_LoCop, which guarantee local differential privacy based on the randomized response technique while synthesizing and releasing high-dimensional crowdsourced data with high data utility. Particularly, LoCop leverages copula theory to synthesize high-dimensional crowdsourced data via univariate marginal distribution and attribute dependence. Univariate marginal distribution is estimated by the Lasso-based regression algorithm from aggregated privacy-preserving bit strings. Dependencies among attributes are modeled as multivariate Gaussian copula. Based on LoCop, the enhanced solution DR_LoCop not only takes advantage of C-vine copula to reflect conditional dependencies among high-dimensional attributes, but also achieves dimension reduction. Extensive experiments on real-world datasets demonstrate that our solutions substantially outperform the state-of-the-art techniques in terms of both data utility and computational overhead.
What problem does this paper attempt to address?