Abstract:Collecting and analyzing users' set-valued data with privacy-preserving is a common scenario in real life. However, the existing solutions in LDP are not efficient enough, where users perturbing their data locally introduces a large amount of noise. The shuffle model, which adds a shuffler in LDP to shuffle all perturbed values, can amplify privacy, then improve utility. Inspired by this, we study the frequency estimation and top- <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$k$</tex-math></inline-formula> frequent item estimation of set-valued data in the shuffle model. To solve the challenges of different item quantities of users and further improve the utility, we combine sampling and shuffling together, and propose the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Encoding, Padding, Sampling, and Shuffling</i> framework, i.e., EPS <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$^{2}$</tex-math></inline-formula> . Based on this framework, we propose three protocols for frequency estimation in different application scenarios, then assemble them into multi-phase protocols for the top- <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$k$</tex-math></inline-formula> frequent item estimation. Theoretically, we identify all three protocols gain dual privacy amplification from sampling and shuffling. And by setting the size of users' set to 1, we can extend this amplified bound to the single-valued frequency estimation scenario, producing a tighter privacy bound than existing works. Finally, we perform experiments on both synthetic and real-world datasets to demonstrate the effectiveness of our protocols.

EPS<inline-formula><tex-math notation="LaTeX">$^{2}$</tex-math></inline-formula>: Privacy Preserving Set-Valued Data Analysis in the Shuffle Model

Privacy Amplification via Shuffling: Unified, Simplified, and Tightened

Privacy Enhancement Via Dummy Points in the Shuffle Model

Segmented Private Data Aggregation in the Multi-message Shuffle Model

Beyond Statistical Estimation: Differentially Private Individual Computation via Shuffling

Enhanced Privacy Bound for Shuffle Model with Personalized Privacy

Analyzing the Shuffle Model through the Lens of Quantitative Information Flow

Distributed Differential Privacy via Shuffling Versus Aggregation: A Curious Study

Shuffle Differential Private Data Aggregation for Random Population

Stronger Privacy Amplification by Shuffling for Rényi and Approximate Differential Privacy

Renyi Differential Privacy in the Shuffle Model: Enhanced Amplification Bounds

Tight Differential Privacy Blanket for Shuffle Model

Echo of Neighbors: Privacy Amplification for Personalized Private Federated Learning with Shuffle Model

Locally Private Set-valued Data Analyses: Distribution and Heavy Hitters Estimation

Streaming Data Collection With a Private Sketch-Based Protocol

Privacy Amplification via Shuffled Check-Ins

PrivSketch: A Private Sketch-based Frequency Estimation Protocol for Data Streams

PrivSet: Set-Valued Data Analyses with Locale Differential Privacy.

Almost Instance-optimal Clipping for Summation Problems in the Shuffle Model of Differential Privacy

Encode, Shuffle, Analyze Privacy Revisited: Formalizations and Empirical Evaluation.

Shuffle-based Private Set Union: Faster and More Secure.