Rate-Optimal Rank Aggregation with Private Pairwise Rankings

Shirong Xu,Will Wei Sun,Guang Cheng
2024-08-09
Abstract:In various real-world scenarios, such as recommender systems and political surveys, pairwise rankings are commonly collected and utilized for rank aggregation to obtain an overall ranking of items. However, preference rankings can reveal individuals' personal preferences, underscoring the need to protect them from being released for downstream analysis. In this paper, we address the challenge of preserving privacy while ensuring the utility of rank aggregation based on pairwise rankings generated from a general comparison model. Using the randomized response mechanism to perturb raw pairwise rankings is a common privacy protection strategy used in practice. However, a critical challenge arises because the privatized rankings no longer adhere to the original model, resulting in significant bias in downstream rank aggregation tasks. Motivated by this, we propose to adaptively debiasing the rankings from the randomized response mechanism, ensuring consistent estimation of true preferences and enhancing the utility of downstream rank aggregation. Theoretically, we offer insights into the relationship between overall privacy guarantees and estimation errors from private ranking data, and establish minimax rates for estimation errors. This enables the determination of optimal privacy guarantees that balance consistency in rank aggregation with privacy protection. We also investigate convergence rates of expected ranking errors for partial and full ranking recovery, quantifying how privacy protection influences the specification of top-$K$ item sets and complete rankings. Our findings are validated through extensive simulations and a real application.
Machine Learning,Cryptography and Security
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is: **How to ensure the utility and accuracy of the ranking aggregation task based on pairwise ranking data while protecting personal privacy?** Specifically, the paper focuses on how to reduce the interference with the original ranking distribution under privacy - protection mechanisms (such as the random response mechanism), so as to achieve consistent parameter estimation and improve the performance of downstream ranking aggregation tasks. --- ### Decomposition of the main problems in the paper: 1. **Trade - off between privacy and utility**: - In practical application scenarios, pairwise ranking data (such as in recommendation systems, political surveys, etc.) may contain personal preference information, so privacy protection is required. - However, traditional privacy - protection methods (such as the random response mechanism) will change the distribution of the original ranking data, resulting in significant deviations in downstream ranking aggregation tasks. - The goal of this paper is to design a new method that minimizes the impact on utility while protecting privacy. 2. **Limitations of the classical random response mechanism**: - Using the classical random response mechanism will introduce noise into the pairwise ranking data, making the privatized ranking no longer conform to the original model (such as the BTL model or TM model). - This deviation will lead to the inability to consistently estimate the true preference parameter vector \( \theta^\star \). 3. **Design of the adaptive de - biasing random response mechanism**: - To solve the above problems, the paper proposes an **Adaptive De - biasing Random Response Mechanism (ADRR)**, which reduces the bias by adjusting the weights while maintaining privacy protection. - The ADRR mechanism can make the privatized ranking data closer to the distribution of the original model, thereby improving the utility of downstream tasks. 4. **Theoretical analysis and convergence rate**: - The paper theoretically analyzes the impact of privacy protection on parameter estimation error and establishes the minimax convergence rates. - Specifically, the paper studies the convergence speed of the expected ranking error in the partial ranking recovery and complete ranking recovery scenarios, and quantifies the impact of privacy protection on the top - K item set and the complete ranking normalization. --- ### Summary of mathematical formulas: 1. **Probability distribution of the BTL model**: \[ P(y_{ij} = 1) = F(\theta_i^\star - \theta_j^\star) \] where \( F(x)=(1 + e^{-x})^{-1} \) is the logistic function. 2. **Output distribution of the random response mechanism**: \[ eY_{ij} = 1 \quad \text{with probability} \quad \frac{1}{2}+\frac{e^{\theta_i^\star - \theta_j^\star}}{e^{\theta_i^\star}+e^{\theta_j^\star}}\left(\frac{1}{2}-p_\epsilon\right) \] where \( p_\epsilon=\frac{1}{e^\epsilon + 1} \), \( \epsilon \) is the privacy parameter. 3. **Formula for the de - biasing step**: \[ ez_{ij}^{(l)}=\frac{(e^{\epsilon_l}+1)eY_{ij}^{(l)}-1}{e^{\epsilon_l}-1} \] 4. **Formula for the adaptive de - biasing random response mechanism**: \[ z_{ij}^{(l)} = w_l e z_{ij}^{(l)}=\frac{(e^{\epsilon_l}-1)^2}{\sum_{l = 1}^L(e^{\epsilon_l}-1)^2}e z_{ij}^{(l)} \] 5. **Convergence rate of parameter estimation error**: \[ m^{-\frac{1}{2}}\|