Liangyuan Hu,Jiayi Ji,Ronald D. Ennis,Joseph W. Hogan
Abstract:When drawing causal inferences about the effects of multiple treatments on clustered survival outcomes using observational data, we need to address implications of the multilevel data structure, multiple treatments, censoring and unmeasured confounding for causal analyses. Few off-the-shelf causal inference tools are available to simultaneously tackle these issues. We develop a flexible random-intercept accelerated failure time model, in which we use Bayesian additive regression trees to capture arbitrarily complex relationships between censored survival times and pre-treatment covariates and use the random intercepts to capture cluster-specific main effects. We develop an efficient Markov chain Monte Carlo algorithm to draw posterior inferences about the population survival effects of multiple treatments and examine the variability in cluster-level effects. We further propose an interpretable sensitivity analysis approach to evaluate the sensitivity of drawn causal inferences about treatment effect to the potential magnitude of departure from the causal assumption of no unmeasured confounding. Expansive simulations empirically validate and demonstrate good practical operating characteristics of our proposed methods. Applying the proposed methods to a dataset on older high-risk localized prostate cancer patients drawn from the National Cancer Database, we evaluate the comparative effects of three treatment approaches on patient survival, and assess the ramifications of potential unmeasured confounding. The methods developed in this work are readily available in the $\textsf{R}$ package $\textsf{riAFTBART}$.
What problem does this paper attempt to address?
### What problems does this paper attempt to solve?
This paper aims to solve the challenges in complex situations such as multi - level (hierarchical) survival data structures, multiple treatment options, censored data, and unmeasured confounding factors when making causal inferences using observational data. Specifically:
1. **Multiple treatment options**: For patients with high - risk localized prostate cancer, there are three main treatment options: radical prostatectomy (RP), external - beam radiation therapy combined with androgen deprivation therapy (EBRT + AD), and external - beam radiation therapy plus brachytherapy (EBRT + brachy ± AD). The choice of these treatment options is closely related to the patient's health and demographic characteristics.
2. **Multi - level data structure**: The research data comes from the National Cancer Database (NCDB), which collects data from more than 1,500 accredited cancer treatment institutions. There may be significant differences in treatment effects between different institutions, and these institutions are not randomly selected, resulting in large inter - institutional variation.
3. **Censored data**: Survival time data may be right - censored, that is, the survival time of some patients cannot be fully observed.
4. **Unmeasured confounding factors**: Some important confounding factors (such as the number of positive cores and magnetic resonance imaging results) may not be recorded in the observational data, which may lead to bias in causal inferences.
To solve these problems, the authors propose a flexible random - intercept accelerated failure - time model (riAFT - BART), using Bayesian additive regression trees (BART) to capture any complex relationships between survival time and pretreatment covariates, and through random intercepts to capture cluster - specific main effects. In addition, they also develop an interpretive sensitivity analysis method to evaluate the impact of unmeasured confounding factors on the results of causal inferences.
### Method overview
- **Model construction**: A random - intercept accelerated failure - time model (riAFT - BART) is proposed, in which BART is used to flexibly model unknown functions, and at the same time, the variation between institutions is captured through random intercepts.
- **Posterior inference**: An efficient Markov chain Monte Carlo (MCMC) algorithm is developed to perform posterior inference on the population survival effects of multiple treatment options and to examine the variability of cluster - level effects.
- **Sensitivity analysis**: An interpretive sensitivity analysis method is proposed to evaluate the impact of potential unmeasured confounding factors on the results of causal inferences.
### Application example
This method was applied to the NCDB data set to evaluate the impact of three treatment options on the survival of elderly patients with high - risk localized prostate cancer, and to evaluate the impact of potential unmeasured confounding factors and institutional effects.
### Conclusion
This paper fills the gaps in the existing literature in dealing with multi - treatment options and multi - level censored survival data, providing a flexible and powerful tool for better understanding and evaluating causal relationships in complex medical data.