PriPL-Tree: Accurate Range Query for Arbitrary Distribution under Local Differential Privacy

Leixia Wang,Qingqing Ye,Haibo Hu,Xiaofeng Meng
2024-08-24
Abstract:Answering range queries in the context of Local Differential Privacy (LDP) is a widely studied problem in Online Analytical Processing (OLAP). Existing LDP solutions all assume a uniform data distribution within each domain partition, which may not align with real-world scenarios where data distribution is varied, resulting in inaccurate estimates. To address this problem, we introduce PriPL-Tree, a novel data structure that combines hierarchical tree structures with piecewise linear (PL) functions to answer range queries for arbitrary distributions. PriPL-Tree precisely models the underlying data distribution with a few line segments, leading to more accurate results for range queries. Furthermore, we extend it to multi-dimensional cases with novel data-aware adaptive grids. These grids leverage the insights from marginal distributions obtained through PriPL-Trees to partition the grids adaptively, adapting the density of underlying distributions. Our extensive experiments on both real and synthetic datasets demonstrate the effectiveness and superiority of PriPL-Tree over state-of-the-art solutions in answering range queries across arbitrary data distributions.
Cryptography and Security,Databases
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve the accuracy problem of range queries in the Local Differential Privacy (LDP) environment. Specifically, existing LDP solutions assume that the data distribution within each domain partition is uniform, which does not conform to the actual situation of diverse data distributions in the real world, resulting in inaccurate estimation results. To solve this problem, the author introduces a new data structure - **PriPL - Tree**, which combines a hierarchical tree structure and a piecewise linear (PL) function to handle range queries under arbitrary distributions. PriPL - Tree improves the accuracy of range queries by using a small number of line segments to accurately model the underlying data distribution. In addition, the author also extends this method to the multi - dimensional case and proposes adaptive grids to adapt to data distributions of different densities. ### Summary of key issues 1. **Limitations of existing methods**: - Existing LDP solutions assume that data is uniformly distributed within each domain partition, which does not conform to the actual situation. - This assumption will lead to non - uniform errors in practical applications, thus affecting the accuracy of query results. 2. **Innovations of PriPL - Tree**: - **Modeling with piecewise linear functions**: PriPL - Tree uses piecewise linear functions to approximate the underlying data distribution instead of relying on the uniform distribution assumption. This can more accurately capture complex data distributions. - **Reducing noise errors**: By reducing the number of nodes and the height of the tree, PriPL - Tree can effectively reduce noise errors in the LDP environment. - **Adapting to large - scale data**: The number of parameters of PriPL - Tree only depends on the shape of the data distribution, not on the domain size, so it can better adapt to large - scale data sets. 3. **Multi - dimensional extension**: - To handle multi - dimensional range queries, the author proposes an adaptive grid method. These grids are dynamically adjusted according to the marginal distribution to adapt to data distributions of different densities. ### Conclusion This paper solves the accuracy problem of existing LDP methods in handling arbitrary data distributions by proposing PriPL - Tree and its multi - dimensional extension. Experimental results show that PriPL - Tree performs better than existing methods on real and synthetic data sets and significantly improves the accuracy of range queries.