Abstract:The g-formula can be used to estimate causal effects of sustained treatment strategies using observational data under the identifying assumptions of consistency, positivity, and exchangeability. The non-iterative conditional expectation (NICE) estimator of the g-formula also requires correct estimation of the conditional distribution of the time-varying treatment, confounders, and outcome. Parametric models, which have been traditionally used for this purpose, are subject to model misspecification, which may result in biased causal estimates. Here, we propose a unified deep learning framework for the NICE g-formula estimator that uses multitask recurrent neural networks for estimation of the joint conditional distributions. Using simulated data, we evaluated our model's bias and compared it with that of the parametric g-formula estimator. We found lower bias in the estimates of the causal effect of sustained treatment strategies on a survival outcome when using the deep learning estimator compared with the parametric NICE estimator in settings with simple and complex temporal dependencies between covariates. These findings suggest that our Deep Learning g-formula estimator may be less sensitive to model misspecification than the classical parametric NICE estimator when estimating the causal effect of sustained treatment strategies from complex observational data.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to use deep - learning methods to estimate the causal effects of continuous treatment strategies in complex observational data, especially in the presence of simple and complex time - dependencies. Specifically, the authors propose a non - iterative conditional expectation (NICE) g - formula estimator based on multi - task recurrent neural networks (such as LSTM) to overcome the possible model misspecification problems of traditional parametric models, thereby reducing the bias in causal effect estimation. ### Core Problems of the Paper 1. **Model Misspecification Problem in Causal Effect Estimation** - Traditional parametric models (such as generalized linear models GLMs) are prone to model misspecification when estimating the conditional distributions of time - varying covariates, treatments, and outcomes, which may lead to biases in causal effect estimation. 2. **Causal Inference under Complex Time - Dependencies** - In real - world data, there may be complex long - term dependencies between covariates and treatments. Traditional parametric methods have difficulty capturing these complex time - dependencies, while deep - learning methods (such as LSTM) can better handle these problems. ### Solutions The paper proposes a unified deep - learning framework that uses multi - task recurrent neural networks (such as LSTM) to estimate the joint conditional distribution. The advantages of this method are: - **No Need to Explicitly Specify the Functional Forms between Covariates**: Deep - learning models such as LSTM can automatically learn the complex relationships between covariates, reducing the dependence on correct model specification. - **Better Handling of Complex Time - Dependencies**: LSTM is particularly suitable for modeling longitudinal data and can capture the potential long - term dependencies in the covariate trajectories. ### Research Methods The authors evaluated the performance of the proposed deep - learning NICE g - formula estimator through simulated data and compared it with the traditional parametric NICE g - formula estimator. The research results show that in cases of simple and complex time - dependencies, the deep - learning estimator has lower bias, especially more明显 in large sample sizes. ### Main Conclusions - The deep - learning NICE g - formula estimator generally has lower bias than the traditional parametric NICE method when estimating the causal effects of continuous treatment strategies in complex observational data. - Even in the case of simple time - dependencies, the deep - learning estimator also shows less bias. - However, the performance of the deep - learning estimator may be sensitively affected by sample size and the complexity of data structure. Future research should further explore its statistical inference conditions and model selection strategies. ### Formula Summary For causal effect estimation, the paper uses the following key formulas: - **g - formula** \[ E(Y_g)=\sum_{\forall \bar{l}_{K - 1}}\sum_{k = 1}^{K}P(Y_k = 1|Y_{k - 1}=0,\bar{L}_{k - 1}=\bar{l}_{k - 1},\bar{A}_{k - 1}=\bar{a}_g^{k - 1})\times\prod_{s = 0}^{k - 1}P(Y_s = 0|Y_{s - 1}=0,\bar{L}_{s - 1}=\bar{l}_{s - 1},\bar{A}_{s - 1}=\bar{a}_g^{s - 1})f(l_s|Y_s = 0,\bar{l}_{s - 1},\bar{a}_g^{s - 1}) \] where \(\bar{X}_k=(X_0,\ldots,X_k)\) represents the historical values of the random variable \(X\) before time \(k\). - **Bias Definition** \[ \text{Bias}(\hat{R}_{method,k})=\hat{R}_{method,k}-R_{true,k} \] For causal effect estimation (risk ratio RR and risk difference RD), the biases are defined as: \[ \text{Bias}(\hat{RR}_{method,k})=\hat{RR}_{method,k}-RR_{true,

Deep Learning Methods for the Noniterative Conditional Expectation G-Formula for Causal Inference from Complex Observational Data

A flexible Bayesian g-formula for causal survival analyses with time-dependent confounding

Neural Networks with Causal Graph Constraints: A New Approach for Treatment Effects Estimation

Deep Nonparametric Inference for Conditional Hazard Function

G-formula for causal inference via multiple imputation

Causal Inference using Multivariate Generalized Linear Mixed-Effects Models with Longitudinal Data

Deep End-to-end Causal Inference

Deep Learning With DAGs

Causality for Complex Continuous-time Functional Longitudinal Studies with Dynamic Treatment Regimes

Deep Learning for Causal Inference: A Comparison of Architectures for Heterogeneous Treatment Effect Estimation

Causal Deep Learning

C-Learner: Constrained Learning for Causal Inference and Semiparametric Statistics

End-To-End Causal Effect Estimation from Unstructured Natural Language Data

CF-GODE: Continuous-Time Causal Inference for Multi-Agent Dynamical Systems

Average Causal Effect Estimation in DAGs with Hidden Variables: Extensions of Back-Door and Front-Door Criteria

Estimating Identifiable Causal Effects through Double Machine Learning

Proximal Causal Inference for Complex Longitudinal Studies

Estimating curvilinear time-varying treatment effects: Combining g-estimation of structural nested mean models with time-varying effect models for longitudinal causal inference.

Deep Learning-based Group Causal Inference in Multivariate Time-series

Deep Causal Inference for Point-referenced Spatial Data with Continuous Treatments