Abstract:Randomised controlled trials (RCTs) are regarded as the gold standard for estimating causal treatment effects on health outcomes. However, RCTs are not always feasible, because of time, budget or ethical constraints. Observational data such as those from electronic health records (EHRs) offer an alternative way to estimate the causal effects of treatments. Recently, the `target trial emulation' framework was proposed by Hernan and Robins (2016) to provide a formal structure for estimating causal treatment effects from observational data. To promote more widespread implementation of target trial emulation in practice, we develop the R package TrialEmulation to emulate a sequence of target trials using observational time-to-event data, where individuals who start to receive treatment and those who have not been on the treatment at the baseline of the emulated trials are compared in terms of their risks of an outcome event. Specifically, TrialEmulation provides (1) data preparation for emulating a sequence of target trials, (2) calculation of the inverse probability of treatment and censoring weights to handle treatment switching and dependent censoring, (3) fitting of marginal structural models for the time-to-event outcome given baseline covariates, (4) estimation and inference of marginal intention to treat and per-protocol effects of the treatment in terms of marginal risk differences between treated and untreated for a user-specified target trial population. In particular, TrialEmulation can accommodate large data sets (e.g., from EHRs) within memory constraints of R by processing data in chunks and applying case-control sampling. We demonstrate the functionality of TrialEmulation using a simulated data set that mimics typical observational time-to-event data in practice.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to solve the problem of how to use observational time - event data (such as data in electronic health records) to estimate causal treatment effects. Specifically, the author developed an R package named `TrialEmulation` for simulating target trials (Target Trials) to overcome the shortcomings of randomized controlled trials (RCTs) in terms of time and budget limitations, ethical constraints, etc. ### Background and motivation 1. **Randomized controlled trials (RCTs)** - RCTs are the gold standard for estimating the causal effect of treatment on health outcomes. - By randomly assigning individuals to receive treatment or not, it is ensured that the treatment group and the non - treatment group are comparable at the time of treatment allocation. 2. **Challenges of observational data** - RCTs are not always feasible due to time, budget or ethical limitations. - Observational data (such as electronic health records) provide an alternative method, but it is not straightforward to estimate causal effects directly from these data. - Treatment allocation is usually not random, resulting in baseline differences between the treatment group and the non - treatment group that may partly explain the observed outcome differences. 3. **Target trial simulation framework** - Hernán and Robins (2016) proposed the "target trial simulation" framework, which provides a formal structure for estimating causal effects using observational data. - This framework involves specifying a protocol for the target trial that one wishes to conduct, identifying individuals in the observational database who meet the eligibility criteria of this target trial, and comparing the outcomes of the treated and untreated groups of these individuals during the trial follow - up. ### Functions of the `TrialEmulation` package 1. **Data preparation** - Prepare data for simulating a series of target trials. - Expand the original dataset to create datasets corresponding to a series of target trials. 2. **Handling treatment switching and dependent censoring** - Calculate inverse probability of treatment weights and inverse probability of censoring weights to handle treatment switching and dependent censoring. 3. **Fitting marginal structural models** - Use inverse probability weights to fit marginal structural models to estimate time - event outcomes given baseline covariates. 4. **Estimating and inferring treatment effects** - Estimate and infer the intention - to - treat (ITT) effect and the per - protocol (PP) effect of treatment. - Estimate the treatment effect by calculating the marginal risk difference between the treatment group and the non - treatment group. ### Technical details 1. **Inverse probability of censoring weights** - Handle dependent censoring through inverse probability of censoring weights (IPCW). - The formula is as follows: \[ SWC_k=\prod_{j = 0}^{k - 1}\frac{\Pr(C_j = 0|C_{j - 1}=0,Y_j = 0,A_0,V,L_0)}{\Pr(C_j = 0|C_{j - 1}=0,Y_j = 0,A_j,V,L_j)} \] - Where \(SWC_0 = 1\), indicating that the inverse probability of censoring weights are not applied at baseline. 2. **Marginal structural models** - Use a weighted mixed logistic regression model to fit potential time - event outcomes. - For example, the model can be represented as: \[ \text{logit}\{\Pr(Y_k = 1|Y_{k - 1}=0,A_0 = a,V,L_0)\}=\beta_0+\beta_1k+\beta_2k^2+\beta_3a+\beta_4^{\top}V+\beta_5^{\top}L_0 \] 3. **Handling large datasets** - The `TrialEmulation` package can handle large datasets by processing data in chunks and applying case - control sampling to reduce the computational burden. ### Conclusion `TrialEmulat

TrialEmulation: An R Package to Emulate Target Trials for Causal Analysis of Observational Time-to-event Data

Target Trial Emulation for Evaluating Health Policy

Clinical trial emulation in nephrology

Introduction to target trial emulation in rehabilitation: a systematic approach to emulate a randomized controlled trial using observational data

Adjusting for Selection Bias Due to Missing Eligibility Criteria in Emulated Target Trials

Reference-trial-informed design to explore treatment effects in trial-underrepresented subgroups

Target Trial Emulation: A Call for More Widespread Use

Randomized trials and their observational emulations: a framework for benchmarking and joint analysis

Target Trial Emulation for Transparent and Robust Estimation of Treatment Effects for Health Technology Assessment Using Real-World Data: Opportunities and Challenges

Conducting observational analyses with the target trial emulation approach: a methodological systematic review

[Basic Principles, Design Elements, Advantages and Challenges of Emulated Target Trial].

Designing target trials using electronic health records: A case study of second-line disease-modifying anti-rheumatic drugs and cardiovascular disease outcomes in patients with rheumatoid arthritis

Emulations of Oncology Trials Using Real-World Data: A Systematic Literature Review

Trial emulation and survival analysis for disease incidence registers: a case study on the causal effect of pre-emptive kidney transplantation

The perpetual need of randomized clinical trials: challenges and uncertainties in emulating the REDUCE-AMI trial

causalCmprsk: An R package for nonparametric and Cox-based estimation of average treatment effects in competing risks data

A Trial Emulation Approach for Policy Evaluations with Group-level Longitudinal Data

Re. Emulating a Target Trial of Interventions Initiated During Pregnancy With Healthcare Databases: The Example of COVID-19 Vaccination

Series 2-19-2009 Multiple Imputation Methods for Treatment Noncompliance and Nonresponse in Randomized Clinical Trials

Causal survival analysis: A guide to estimating intention-to-treat and per-protocol effects from randomized clinical trials with non-adherence

Calibrating Observational Health Record Data Against a Randomized Trial