TrialEmulation: An R Package to Emulate Target Trials for Causal Analysis of Observational Time-to-event Data

Li Su,Roonak Rezvani,Shaun R. Seaman,Colin Starr,Isaac Gravestock
2024-02-19
Abstract:Randomised controlled trials (RCTs) are regarded as the gold standard for estimating causal treatment effects on health outcomes. However, RCTs are not always feasible, because of time, budget or ethical constraints. Observational data such as those from electronic health records (EHRs) offer an alternative way to estimate the causal effects of treatments. Recently, the `target trial emulation' framework was proposed by Hernan and Robins (2016) to provide a formal structure for estimating causal treatment effects from observational data. To promote more widespread implementation of target trial emulation in practice, we develop the R package TrialEmulation to emulate a sequence of target trials using observational time-to-event data, where individuals who start to receive treatment and those who have not been on the treatment at the baseline of the emulated trials are compared in terms of their risks of an outcome event. Specifically, TrialEmulation provides (1) data preparation for emulating a sequence of target trials, (2) calculation of the inverse probability of treatment and censoring weights to handle treatment switching and dependent censoring, (3) fitting of marginal structural models for the time-to-event outcome given baseline covariates, (4) estimation and inference of marginal intention to treat and per-protocol effects of the treatment in terms of marginal risk differences between treated and untreated for a user-specified target trial population. In particular, TrialEmulation can accommodate large data sets (e.g., from EHRs) within memory constraints of R by processing data in chunks and applying case-control sampling. We demonstrate the functionality of TrialEmulation using a simulated data set that mimics typical observational time-to-event data in practice.
Computation,Methodology
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve the problem of how to use observational time - event data (such as data in electronic health records) to estimate causal treatment effects. Specifically, the author developed an R package named `TrialEmulation` for simulating target trials (Target Trials) to overcome the shortcomings of randomized controlled trials (RCTs) in terms of time and budget limitations, ethical constraints, etc. ### Background and motivation 1. **Randomized controlled trials (RCTs)** - RCTs are the gold standard for estimating the causal effect of treatment on health outcomes. - By randomly assigning individuals to receive treatment or not, it is ensured that the treatment group and the non - treatment group are comparable at the time of treatment allocation. 2. **Challenges of observational data** - RCTs are not always feasible due to time, budget or ethical limitations. - Observational data (such as electronic health records) provide an alternative method, but it is not straightforward to estimate causal effects directly from these data. - Treatment allocation is usually not random, resulting in baseline differences between the treatment group and the non - treatment group that may partly explain the observed outcome differences. 3. **Target trial simulation framework** - Hernán and Robins (2016) proposed the "target trial simulation" framework, which provides a formal structure for estimating causal effects using observational data. - This framework involves specifying a protocol for the target trial that one wishes to conduct, identifying individuals in the observational database who meet the eligibility criteria of this target trial, and comparing the outcomes of the treated and untreated groups of these individuals during the trial follow - up. ### Functions of the `TrialEmulation` package 1. **Data preparation** - Prepare data for simulating a series of target trials. - Expand the original dataset to create datasets corresponding to a series of target trials. 2. **Handling treatment switching and dependent censoring** - Calculate inverse probability of treatment weights and inverse probability of censoring weights to handle treatment switching and dependent censoring. 3. **Fitting marginal structural models** - Use inverse probability weights to fit marginal structural models to estimate time - event outcomes given baseline covariates. 4. **Estimating and inferring treatment effects** - Estimate and infer the intention - to - treat (ITT) effect and the per - protocol (PP) effect of treatment. - Estimate the treatment effect by calculating the marginal risk difference between the treatment group and the non - treatment group. ### Technical details 1. **Inverse probability of censoring weights** - Handle dependent censoring through inverse probability of censoring weights (IPCW). - The formula is as follows: \[ SWC_k=\prod_{j = 0}^{k - 1}\frac{\Pr(C_j = 0|C_{j - 1}=0,Y_j = 0,A_0,V,L_0)}{\Pr(C_j = 0|C_{j - 1}=0,Y_j = 0,A_j,V,L_j)} \] - Where \(SWC_0 = 1\), indicating that the inverse probability of censoring weights are not applied at baseline. 2. **Marginal structural models** - Use a weighted mixed logistic regression model to fit potential time - event outcomes. - For example, the model can be represented as: \[ \text{logit}\{\Pr(Y_k = 1|Y_{k - 1}=0,A_0 = a,V,L_0)\}=\beta_0+\beta_1k+\beta_2k^2+\beta_3a+\beta_4^{\top}V+\beta_5^{\top}L_0 \] 3. **Handling large datasets** - The `TrialEmulation` package can handle large datasets by processing data in chunks and applying case - control sampling to reduce the computational burden. ### Conclusion `TrialEmulat