Abstract:Selecting the top-$k$ highest scoring items under differential privacy (DP) is a fundamental task with many applications. This work presents three new results. First, the exponential mechanism, permute-and-flip and report-noisy-max, as well as their oneshot variants, are unified into the Lipschitz mechanism, an additive noise mechanism with a single DP-proof via a mandated Lipschitz property for the noise distribution. Second, this new generalized mechanism is paired with a canonical loss function to obtain the canonical Lipschitz mechanism, which can directly select k-subsets out of $d$ items in $O(dk+d \log d)$ time. The canonical loss function assesses subsets by how many users must change for the subset to become top-$k$. Third, this composition-free approach to subset selection improves utility guarantees by an $\Omega(\log k)$ factor compared to one-by-one selection via sequential composition, and our experiments on synthetic and real-world data indicate substantial utility improvements.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to efficiently and accurately select the top - k items (top - k selection) under the premise of satisfying differential privacy (Differential Privacy, DP). Specifically, the paper focuses on how to select the k most representative elements from a large amount of data while protecting user privacy and ensure the accuracy of the selection results. ### Background and Problem Description In many application scenarios, such as feature selection, policy evaluation, model selection, etc., it is often necessary to select the top - k items from a set of candidate items. However, these applications usually rely on user data, thus raising privacy issues. To address these problems, researchers have proposed the concept of differential privacy to ensure that the selected k items do not overly depend on the private information of any single user. ### Main Contributions of the Paper 1. **Lipschitz Mechanism**: - Unify several existing mechanisms (such as the exponential mechanism, the permute - and - flip mechanism, the report - noisy - max mechanism and its one - time variant), and introduce a new additive noise mechanism - the Lipschitz mechanism. - This mechanism ensures ε - differential privacy by adding noise that satisfies the 1 - Lipschitz condition. 2. **Canonical Lipschitz Mechanism**: - Combine the Lipschitz mechanism with a canonical loss function and propose the Canonical Lipschitz mechanism for directly selecting a k - subset from d items with a time complexity of O(dk + dlogd). - The canonical loss function improves the selection accuracy by evaluating how many users need to be changed in order to make a certain subset become top - k. 3. **Non - Combinatorial Method vs Sequential Combinatorial Method**: - Prove that the non - combinatorial method (such as CANONICAL) can improve the utility guarantee compared with the traditional sequential combinatorial method (such as PEELING), with an improvement factor of Ω(logk). - Experimental results show that on synthetic data and real - world data, the non - combinatorial method significantly improves the utility. ### Formula Representation - **Lipschitz Condition**: \[ | \log(1 - F(x)) - \log(1 - F(x + c)) | \leq |c| \] where $ F $ is the cumulative distribution function and $ F^{-1} $ is its inverse function. - **Output of the Canonical Lipschitz Mechanism**: \[ \{Y_1, \dots, Y_\kappa\} = \arg \max_{y \in Y[\kappa]} \left\{ \text{LOSS}(y|\hat{x}) - \frac{2\kappa \Delta_{\text{LOSS}}}{\varepsilon} + F^{-1}(U_y) \right\} \] - **Canonical Loss Function**: \[ \text{LOSS}(y|\vec{x}) = (1 - \gamma)\vec{x}[h + 1] - \gamma \vec{x}[t] \] where $ y \in C_{h,t} $, $ \gamma \in [0,1] $, and this loss function has a sensitivity of 1. ### Conclusion This paper solves the problem of efficiently selecting top - k items under the constraint of differential privacy by proposing new Lipschitz mechanisms and Canonical Lipschitz mechanisms, and verifies the effectiveness and superiority of the new methods through theoretical analysis and experimental verification.

Differentially Private Top-k Selection via Canonical Lipschitz Mechanism

Permute-and-Flip: A new mechanism for differentially private selection

Beyond the Calibration Point: Mechanism Comparison in Differential Privacy

Learning Numeric Optimal Differentially Private Truncated Additive Mechanisms

Tight Data Access Bounds for Private Top-$k$ Selection

A bounded-noise mechanism for differential privacy

DP-SIPS: A simpler, more scalable mechanism for differentially private partition selection

DPMLBench: Holistic Evaluation of Differentially Private Machine Learning

Bounded and Unbiased Composite Differential Privacy

The optimal mechanism in differential privacy

Privacy Profiles for Private Selection

Output Perturbation for Differentially Private Convex Optimization: Faster and More General

Optimum Noise Mechanism for Differentially Private Queries in Discrete Finite Sets

Privacy accounting $\varepsilon$conomics: Improving differential privacy composition via a posteriori bounds

Unified Enhancement of Privacy Bounds for Mixture Mechanisms via $f$-Differential Privacy

Differentially Private Multivariate Statistics with an Application to Contingency Table Analysis

Local Differential Private Data Aggregation for Discrete Distribution Estimation

Differentially Private Kernel Density Estimation

Mutual Information Optimally Local Private Discrete Distribution Estimation

Optimal Tree-Based Mechanisms for Differentially Private Approximate CDFs

Improving the Privacy Loss Under User-Level DP Composition for Fixed Estimation Error