Set-valued Data Publication with Local Privacy
Shaowei Wang,Yuqiu Qian,Jiachun Du,Wei Yang,Liusheng Huang,Hongli Xu
DOI: https://doi.org/10.14778/3389133.3389140
IF: 2.5
2020-01-01
Proceedings of the VLDB Endowment
Abstract:Most user-generated data in online services are presented as set-valued data, e.g., visited website URLs, recently used Apps by a person, and etc. These data are of great value to service providers, but also bring privacy concerns if collected and analyzed directly. To tackle potential privacy threatens, local differential privacy (LDP) attracts increasing attention nowadays. However, existing approaches only provide sub-optimal error bound for set-valued data distribution estimation with LDP. Besides, it is computational expensive and communication expensive to use for high dimensional set-valued data, considering large domains in real scenarios. Thus, existing approaches are unpractical to use on resource-constrained user-side devices (e.g., smartphones and wearable devices). In this paper, we propose a utility-optimal and efficient set-valued data publication method (i.e., wheel mechanism). On the user side, each user contributes only one numerical value to represent their privatized data. The computational complexity is O(min{m log m, meɛ}) and communication cost is O(log(meɛ)) bits, while existing approaches usually depend on O(d) or O(log d), where m is the number of items in the set-valued data (m ≡ 1 for categorical data), d is the domain size (usually d ≫ m) and ɛ is the privacy budget. On the server side, the estimator takes numerical values from users as input and derives an unbiased distribution estimation. Theoretical results show that estimation error bounds are improved from previously known [EQUATION] to the optimal rate [EQUATION]. Results on extensive experiments demonstrate that our proposed wheel mechanism is 3-100× faster than existing approaches, meanwhile has optimal statistical efficiency.