Conformal causal inference for cluster randomized trials: model-robust inference without asymptotic approximations

Bingkai Wang,Fan Li,Mengxin Yu
DOI: https://doi.org/10.48550/arXiv.2401.01977
2024-10-02
Abstract:Traditional statistical inference in cluster randomized trials typically invokes the asymptotic theory that requires the number of clusters to approach infinity. In this article, we propose an alternative conformal causal inference framework for analyzing cluster randomized trials that achieves the target inferential goal in finite samples without the need for asymptotic approximations. Different from traditional inference focusing on estimating the average treatment effect, our conformal causal inference aims to provide prediction intervals for the difference of counterfactual outcomes, thereby providing a new decision-making tool for clusters and individuals in the same target population. We prove that this framework is compatible with arbitrary working outcome models -- including data-adaptive machine learning methods that maximally leverage information from baseline covariates, and enjoys robustness against misspecification of working outcome models. Under our conformal causal inference framework, we develop efficient computation algorithms to construct prediction intervals for treatment effects at both the cluster and individual levels, and further extend to address inferential targets defined based on pre-specified covariate subgroups. Finally, we demonstrate the properties of our methods via simulations and a real data application based on a completed cluster randomized trial for treating chronic pain.
Methodology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that when making statistical inferences in cluster randomized trials (CRTs), traditional methods usually rely on asymptotic theory which assumes that the number of clusters is close to infinity. This is often not valid in practical applications, especially when the number of clusters is small. To overcome this limitation, this paper proposes a new conformal causal inference framework. It aims to achieve the target inference objective in a finite sample without asymptotic approximation. Specifically, the goal of this framework is to provide prediction intervals for the differences in counterfactual outcomes, thereby providing new decision - making tools for clusters and individuals in the same target population. Unlike traditional average treatment effect estimations, the method in this paper can provide more accurate and robust prediction intervals for treatment effects at the individual or cluster level, and is robust to misspecification of the working model. ### Main contributions 1. **Model robustness**: The proposed method provides model robustness in a finite sample under minimal assumptions and is applicable to any working model, including data - adaptive machine learning methods. 2. **Prediction intervals**: Construct prediction intervals for treatment effects at the cluster level and the individual level, and prove the finite - sample validity of these intervals. 3. **Subgroup analysis**: Extend the method to conduct subgroup analysis, that is, achieve the required coverage probability within specific covariate subgroups. 4. **Numerical algorithms**: Develop efficient computational algorithms to construct prediction intervals for treatment effects, and verify the effectiveness of the method through simulation and real - data applications. ### Method overview - **Potential outcome framework**: Use the potential outcome framework to define treatment effects at the cluster level and the individual level. - **Conformal prediction**: Utilize the conformal prediction method to construct prediction intervals to ensure that the required coverage probability is achieved in a finite sample. - **Algorithm steps**: - **Step 1**: For each potential outcome \(Y(a)\), use the split conformal prediction method to construct a prediction interval \(\tilde{C}_C(a)\). - **Step 2**: According to the observed intervention state \(A_{\text{test}}\) and the average outcome \(Y_{\text{test}}\), construct the final conformal interval \(\tilde{C}_C(O_{\text{test}})\) for the cluster - level treatment effect. - **Step 3**: For the individual - level treatment effect, similarly construct a prediction interval \(\tilde{C}_I(O_{\text{test}})\). ### Theoretical guarantees - **Theorem 1**: Under assumptions 1 - 2, for any covariate subset \(\Omega_C\) with positive measure, the conformal interval \(\tilde{C}_C(O_{\text{test}})\) satisfies \(P\{Y_{\text{test}}(1)-Y_{\text{test}}(0)\in\tilde{C}_C(O_{\text{test}})\mid B_{\text{test}}\in\Omega_C\}\geq1 - 2\alpha\). - **Corollary 1**: For a new cluster with only covariate information, the combination of the conformal intervals \(\tilde{C}_C,1(B_{\text{test}})\) and \(\tilde{C}_C,0(B_{\text{test}})\) satisfies a similar coverage probability. ### Practical applications - **Numerical experiments**: Verify the effectiveness and practicality of the method through simulation and the application of real data from a completed cluster - randomized trial of chronic pain treatment. In conclusion, the conformal - based causal inference framework proposed in this paper provides a new, robust, and effective tool for statistical inference in cluster - randomized trials, especially suitable for cases where the number of clusters is limited.