Sketches-based join size estimation under local differential privacy

Meifan Zhang,Xin Liu,Lihua Yin
2024-05-19
Abstract:Join size estimation on sensitive data poses a risk of privacy leakage. Local differential privacy (LDP) is a solution to preserve privacy while collecting sensitive data, but it introduces significant noise when dealing with sensitive join attributes that have large domains. Employing probabilistic structures such as sketches is a way to handle large domains, but it leads to hash-collision errors. To achieve accurate estimations, it is necessary to reduce both the noise error and hash-collision error. To tackle the noise error caused by protecting sensitive join values with large domains, we introduce a novel algorithm called LDPJoinSketch for sketch-based join size estimation under LDP. Additionally, to address the inherent hash-collision errors in sketches under LDP, we propose an enhanced method called LDPJoinSketch+. It utilizes a frequency-aware perturbation mechanism that effectively separates high-frequency and low-frequency items without compromising privacy. The proposed methods satisfy LDP, and the estimation error is bounded. Experimental results show that our method outperforms existing methods, effectively enhancing the accuracy of join size estimation under LDP.
Databases,Cryptography and Security
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve The paper primarily aims to address the issue of estimating the size of sensitive data joins under the Local Differential Privacy (LDP) framework. Specifically: 1. **Noise Error Issue**: When dealing with sensitive join attributes with a large domain, the LDP mechanism introduces significant noise, affecting the accuracy of the estimation. To address this, the authors propose a new algorithm called LDPJoinSketch. 2. **Hash Collision Error Issue**: Although using probabilistic structures (such as sketches) can handle large domains, it leads to hash collision errors. To further improve estimation accuracy, the authors propose an enhanced method called LDPJoinSketch+, which utilizes a frequency-aware perturbation mechanism to effectively distinguish between high-frequency and low-frequency items. Through these two methods, the paper aims to reduce noise error and hash collision error under LDP, thereby improving the accuracy of join size estimation. Experimental results show that this method significantly outperforms existing methods under LDP protection.