Differentially Private Permutation Tests: Applications to Kernel Methods

Ilmun Kim,Antonin Schrab
2024-01-08
Abstract:Recent years have witnessed growing concerns about the privacy of sensitive data. In response to these concerns, differential privacy has emerged as a rigorous framework for privacy protection, gaining widespread recognition in both academic and industrial circles. While substantial progress has been made in private data analysis, existing methods often suffer from impracticality or a significant loss of statistical efficiency. This paper aims to alleviate these concerns in the context of hypothesis testing by introducing differentially private permutation tests. The proposed framework extends classical non-private permutation tests to private settings, maintaining both finite-sample validity and differential privacy in a rigorous manner. The power of the proposed test depends on the choice of a test statistic, and we establish general conditions for consistency and non-asymptotic uniform power. To demonstrate the utility and practicality of our framework, we focus on reproducing kernel-based test statistics and introduce differentially private kernel tests for two-sample and independence testing: dpMMD and dpHSIC. The proposed kernel tests are straightforward to implement, applicable to various types of data, and attain minimax optimal power across different privacy regimes. Our empirical evaluations further highlight their competitive power under various synthetic and real-world scenarios, emphasizing their practical value. The code is publicly available to facilitate the implementation of our framework.
Cryptography and Security,Machine Learning,Methodology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to conduct hypothesis testing while protecting data privacy. Specifically, the authors focus on maintaining the effectiveness and statistical efficiency of hypothesis testing while achieving differential privacy (DP). Differential privacy is a strict privacy - protection framework, which aims to protect the privacy of personal data by adding noise, but this approach usually leads to a decline in statistical performance. Therefore, the goal of this paper is to propose a method that reduces the impact on statistical performance while ensuring differential privacy, especially for non - parametric testing methods such as permutation tests. ### Main Problems and Challenges 1. **Balance between Privacy and Statistical Efficiency**: - A high privacy guarantee requires a large amount of perturbation to the data, which will reduce statistical performance. - Conversely, reducing perturbation can improve statistical efficiency but will sacrifice privacy guarantees. - Therefore, how to find an appropriate balance between the two is a core issue. 2. **Limitations of Existing Methods**: - Many existing private hypothesis - testing methods rely on asymptotic methods to determine the critical values of test statistics, but in practical applications, the reliability of these asymptotic methods is low because they depend on an increase in the amount of data. - Many existing methods are mainly applicable to discrete data and are not directly applicable to continuous or mixed - type data. - Most research focuses on the theoretical aspect and lacks detailed empirical evaluation and open - source code. ### Solutions To address the above challenges, this paper proposes differentially private permutation tests. The main features of this method include: 1. **Extension of Classical Permutation Tests**: - Extend the classical non - private permutation test to the differential - private setting, which is applicable to any test statistic with finite global sensitivity. - Improve the statistical power of the test by reducing the addition of noise through the quantile representation method. 2. **Theoretical Properties**: - Establish sufficient conditions for pointwise consistency and non - asymptotic uniform power. - Propose specific implementations of two - sample tests and independence tests based on kernel methods, namely "dpMMD" and "dpHSIC". 3. **Empirical Evaluation**: - Through extensive synthetic and real - world data experiments, demonstrate the competitiveness of the proposed test method under different privacy settings. - Provide open - source code to facilitate researchers and practitioners to use and verify these methods. ### Formula Examples - **Global Sensitivity**: \[ \Delta_T := \sup_{\pi \in \Pi_n} \sup_{X_n, \tilde{X}_n: d_{\text{ham}}(X_n, \tilde{X}_n) \leq 1} \| T(X_n^\pi) - T(\tilde{X}_n^\pi) \| \] - **p - value of Permutation Test**: \[ b_p := \frac{1}{B + 1} \left( \sum_{i = 1}^B 1\left( T(X_n^{\pi_i}) \geq T(X_n) \right) + 1 \right) \] - **Laplace Mechanism**: \[ M_{\xi}^f(X_n; w) := f(X_n; w) + \frac{\Delta_1^f}{\xi} (\zeta_1, \ldots, \zeta_r)^\top \] where \(\zeta_1, \ldots, \zeta_r\) are independently and identically distributed as \(\text{Laplace}(0, 1)\). Through these methods and theoretical analysis, this paper provides a new framework for effective hypothesis testing under differential privacy protection.