Differentially Private Top-k Selection via Canonical Lipschitz Mechanism

Michael Shekelyan,Grigorios Loukides
DOI: https://doi.org/10.48550/arXiv.2201.13376
2022-02-01
Abstract:Selecting the top-$k$ highest scoring items under differential privacy (DP) is a fundamental task with many applications. This work presents three new results. First, the exponential mechanism, permute-and-flip and report-noisy-max, as well as their oneshot variants, are unified into the Lipschitz mechanism, an additive noise mechanism with a single DP-proof via a mandated Lipschitz property for the noise distribution. Second, this new generalized mechanism is paired with a canonical loss function to obtain the canonical Lipschitz mechanism, which can directly select k-subsets out of $d$ items in $O(dk+d \log d)$ time. The canonical loss function assesses subsets by how many users must change for the subset to become top-$k$. Third, this composition-free approach to subset selection improves utility guarantees by an $\Omega(\log k)$ factor compared to one-by-one selection via sequential composition, and our experiments on synthetic and real-world data indicate substantial utility improvements.
Machine Learning,Cryptography and Security
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to efficiently and accurately select the top - k items (top - k selection) under the premise of satisfying differential privacy (Differential Privacy, DP). Specifically, the paper focuses on how to select the k most representative elements from a large amount of data while protecting user privacy and ensure the accuracy of the selection results. ### Background and Problem Description In many application scenarios, such as feature selection, policy evaluation, model selection, etc., it is often necessary to select the top - k items from a set of candidate items. However, these applications usually rely on user data, thus raising privacy issues. To address these problems, researchers have proposed the concept of differential privacy to ensure that the selected k items do not overly depend on the private information of any single user. ### Main Contributions of the Paper 1. **Lipschitz Mechanism**: - Unify several existing mechanisms (such as the exponential mechanism, the permute - and - flip mechanism, the report - noisy - max mechanism and its one - time variant), and introduce a new additive noise mechanism - the Lipschitz mechanism. - This mechanism ensures ε - differential privacy by adding noise that satisfies the 1 - Lipschitz condition. 2. **Canonical Lipschitz Mechanism**: - Combine the Lipschitz mechanism with a canonical loss function and propose the Canonical Lipschitz mechanism for directly selecting a k - subset from d items with a time complexity of O(dk + dlogd). - The canonical loss function improves the selection accuracy by evaluating how many users need to be changed in order to make a certain subset become top - k. 3. **Non - Combinatorial Method vs Sequential Combinatorial Method**: - Prove that the non - combinatorial method (such as CANONICAL) can improve the utility guarantee compared with the traditional sequential combinatorial method (such as PEELING), with an improvement factor of Ω(logk). - Experimental results show that on synthetic data and real - world data, the non - combinatorial method significantly improves the utility. ### Formula Representation - **Lipschitz Condition**: \[ | \log(1 - F(x)) - \log(1 - F(x + c)) | \leq |c| \] where \( F \) is the cumulative distribution function and \( F^{-1} \) is its inverse function. - **Output of the Canonical Lipschitz Mechanism**: \[ \{Y_1, \dots, Y_\kappa\} = \arg \max_{y \in Y[\kappa]} \left\{ \text{LOSS}(y|\hat{x}) - \frac{2\kappa \Delta_{\text{LOSS}}}{\varepsilon} + F^{-1}(U_y) \right\} \] - **Canonical Loss Function**: \[ \text{LOSS}(y|\vec{x}) = (1 - \gamma)\vec{x}[h + 1] - \gamma \vec{x}[t] \] where \( y \in C_{h,t} \), \( \gamma \in [0,1] \), and this loss function has a sensitivity of 1. ### Conclusion This paper solves the problem of efficiently selecting top - k items under the constraint of differential privacy by proposing new Lipschitz mechanisms and Canonical Lipschitz mechanisms, and verifies the effectiveness and superiority of the new methods through theoretical analysis and experimental verification.