CURATE: Scaling-up Differentially Private Causal Graph Discovery

Payel Bhattacharjee,Ravi Tandon
2024-09-28
Abstract:Causal Graph Discovery (CGD) is the process of estimating the underlying probabilistic graphical model that represents joint distribution of features of a dataset. CGD-algorithms are broadly classified into two categories: (i) Constraint-based algorithms (outcome depends on conditional independence (CI) tests), (ii) Score-based algorithms (outcome depends on optimized score-function). Since, sensitive features of observational data is prone to privacy-leakage, Differential Privacy (DP) has been adopted to ensure user privacy in CGD. Adding same amount of noise in this sequential-natured estimation process affects the predictive performance of the algorithms. As initial CI tests in constraint-based algorithms and later iterations of the optimization process of score-based algorithms are crucial, they need to be more accurate, less noisy. Based on this key observation, we present CURATE (CaUsal gRaph AdapTivE privacy), a DP-CGD framework with adaptive privacy budgeting. In contrast to existing DP-CGD algorithms with uniform privacy budgeting across all iterations, CURATE allows adaptive privacy budgeting by minimizing error probability (for constraint-based), maximizing iterations of the optimization problem (for score-based) while keeping the cumulative leakage bounded. To validate our framework, we present a comprehensive set of experiments on several datasets and show that CURATE achieves higher utility compared to existing DP-CGD algorithms with less privacy-leakage.
Cryptography and Security,Information Theory,Machine Learning,Methodology
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is: in the process of Causal Graph Discovery (CGD), how to improve the prediction performance and scalability of the algorithm while ensuring user privacy. Specifically, the paper proposes a framework named CURATE (CaUsal gRaph AdapTivE privacy), aiming to improve the existing Differential Privacy (DP) CGD algorithms through adaptive privacy budget allocation. ### Problem Background 1. **Causal Graph Discovery (CGD)**: - CGD aims to estimate a partially - connected directed acyclic graph (DAG) from observational data to represent the joint probability distribution among features. - CGD algorithms are divided into two categories: - **Constraint - based algorithms**: rely on conditional independence (CI) tests. - **Score - based algorithms**: rely on optimizing score functions. 2. **Privacy Threats**: - Observational data usually contains sensitive information, such as sociodemographic information, credit history, medical conditions, etc. - Publishing causal graphs or intermediate CI test results may lead to privacy leakage. 3. **Limitations of Existing Methods**: - Existing DP - CGD algorithms add the same amount of noise throughout the estimation process, which will affect the prediction performance, especially in the initial CI tests and the later iterations of the optimization process. - For data sets with a large number of features, the total privacy leakage will increase sharply, affecting scalability. ### Solutions The CURATE framework proposed in the paper solves the above problems in the following ways: 1. **Adaptive Privacy Budget Allocation**: - **Constraint - based algorithms**: CURATE assigns an adaptive privacy budget for each CI test sequence to minimize the total error probability. In particular, it assigns a higher privacy budget for the initial CI test to ensure better prediction performance. - **Score - based algorithms**: CURATE assigns more privacy budget to the later iterations of the optimization process to reduce the risk of missing the optimal solution and improve the convergence speed. 2. **Theoretical Basis**: - The paper analyzes in detail the sensitivity of different CI tests and proposes probability upper bounds based on Type - I and Type - II errors. - Calculate the total privacy leakage through Advanced Composition and Basic Composition. 3. **Experimental Verification**: - The paper conducts extensive experiments on multiple public data sets, and the results show that CURATE has higher prediction performance while ensuring less privacy leakage. ### Summary The CURATE framework significantly improves the performance and scalability of causal graph discovery algorithms under differential privacy through adaptive privacy budget allocation, while effectively protecting user privacy.