Segmented Private Data Aggregation in the Multi-message Shuffle Model

Shaowei Wang,Ruilin Yang,Sufen Zeng,Kaiqi Yu,Rundong Mei,Shaozheng Huang,Wei Yang
2024-07-29
Abstract:The shuffle model of differential privacy (DP) offers compelling privacy-utility trade-offs in decentralized settings (e.g., internet of things, mobile edge networks). Particularly, the multi-message shuffle model, where each user may contribute multiple messages, has shown that accuracy can approach that of the central model of DP. However, existing studies typically assume a uniform privacy protection level for all users, which may deter conservative users from participating and prevent liberal users from contributing more information, thereby reducing the overall data utility, such as the accuracy of aggregated statistics. In this work, we pioneer the study of segmented private data aggregation within the multi-message shuffle model of DP, introducing flexible privacy protection for users and enhanced utility for the aggregation server. Our framework not only protects users' data but also anonymizes their privacy level choices to prevent potential data leakage from these choices. To optimize the privacy-utility-communication trade-offs, we explore approximately optimal configurations for the number of blanket messages and conduct almost tight privacy amplification analyses within the shuffle model. Through extensive experiments, we demonstrate that our segmented multi-message shuffle framework achieves a reduction of about 50\% in estimation error compared to existing approaches, significantly enhancing both privacy and utility.
Cryptography and Security
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is: in the multi - message shuffle model, how to achieve segmented private data aggregation, thereby providing users with flexible privacy protection levels and enhancing the utility of aggregated data. Specifically: 1. **Flexibility of user privacy protection**: - Existing research usually assumes that all users adopt a unified privacy protection level, which may hinder the participation of conservative users and limit free users from contributing more useful information, thus reducing the overall data utility. - This paper proposes a new framework that allows each user to select different privacy protection levels according to their own needs, and these selections are anonymous to the server and other potential privacy attackers. 2. **Improving the utility of data aggregation**: - By introducing segmented privacy protection, this framework not only protects the privacy of users but also improves the utility of data aggregation, making the estimation error reduced by about 50% compared with existing methods. 3. **Privacy amplification analysis**: - In order to optimize the trade - off among privacy, utility, and communication, the author explores the optimal configuration of the number of "blanket messages" and conducts an almost strict privacy amplification analysis. - In the multi - message shuffle model, it is a significant challenge to support personalized privacy budgets while maintaining strict privacy amplification, and this paper conducts in - depth research on this. 4. **Specific protocols and experimental verification**: - The author develops specific protocols for set - valued data aggregation and verifies the effectiveness and efficiency of these protocols through extensive experiments. In summary, this paper aims to solve the problem of overly rigid privacy protection in existing methods by introducing a segmented privacy protection mechanism to achieve more flexible privacy protection and higher data utility in the multi - message shuffle model. ### Formula and symbol explanations - \( \epsilon_i \) represents the privacy budget of user \( i \). - \( E = \{E_1, E_2, \ldots, E_K\} \in \mathbb{R}^K \) represents the list of privacy options, where \( E_k \leq E_{k+1} \) for \( k \in [K - 1] \). - \( L = \{k_1, k_2, \ldots, k_n\} \in [K]^n \) represents the user's privacy level selection. - \( D_p(R_v(x_v) \| R_v(x'_v)) \) represents the \( p \)-Hockey - stick divergence. - \( C \sim \text{Binom}(n - 1,\frac{2\beta p}{(p - 1)q}) \), \( A \sim \text{Binom}(C,\frac{1}{2}) \), \( \Delta_1 \sim \text{Bernoulli}(\frac{\beta p}{p - 1}) \), \( \Delta_2 \sim \text{Bernoulli}(1 - \Delta_1,\frac{\beta}{p - 1 - \beta p}) \). Through these formulas and symbols, the paper describes in detail how to achieve segmented privacy protection in the multi - message shuffle model and ensure its privacy amplification effect.