Federated Causal Inference: Multi-Centric ATE Estimation beyond Meta-Analysis

Rémi Khellaf,Aurélien Bellet,Julie Josse
2024-10-22
Abstract:We study Federated Causal Inference, an approach to estimate treatment effects from decentralized data across centers. We compare three classes of Average Treatment Effect (ATE) estimators derived from the Plug-in G-Formula, ranging from simple meta-analysis to one-shot and multi-shot federated learning, the latter leveraging the full data to learn the outcome model (albeit requiring more communication). Focusing on Randomized Controlled Trials (RCTs), we derive the asymptotic variance of these estimators for linear models. Our results provide practical guidance on selecting the appropriate estimator for various scenarios, including heterogeneity in sample sizes, covariate distributions, treatment assignment schemes, and center effects. We validate these findings with a simulation study.
Machine Learning,Statistics Theory
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to accurately estimate the average treatment effect (ATE) from scattered data in randomized controlled trials (RCTs) carried out in multiple centers (such as different hospitals or research institutions). Specifically, the author explores the effectiveness and efficiency of using the federated causal inference method to estimate ATE in the case of heterogeneity among different centers. These heterogeneities may include differences in sample size, covariate distribution, treatment assignment scheme, and center effects. The main contributions of the paper lie in comparing three types of ATE estimators: 1. **Meta - analysis estimators**: Each center independently calculates the ATE estimate, and then these estimates are aggregated. 2. **One - shot federated estimators**: Estimate the result model parameters in each center, aggregate them and then return them to each center to calculate and aggregate the ATE estimate. 3. **Gradient - based federated estimators**: Learn the result model parameters by using the federated gradient descent method with joint data, and then each center calculates and aggregates its ATE estimate. The author also derives the asymptotic variances of these estimators under the linear model and provides practical guidance for selecting appropriate estimators, especially when facing differences in sample size, changes in covariate distribution, differences in treatment assignment schemes, and center effects. In addition, these findings are verified through simulation studies, providing a basis for practical applications. In summary, this paper aims to provide a new method for causal inference in a multi - center environment, which is especially suitable for situations in the medical field where patient privacy needs to be protected and data regulations need to be complied with.