Abstract:Background: Survival analysis, also known as 'time to event' analysis, is commonly used in evidence-based medicine to estimate the time until events of interest, such as mortality and disease recurrence, occur. In particular, survival analysis with competing risks is a challenging problem designed to deal with situations where there are multiple possible outcomes during the follow up of survival data, and the occurrence of one event can be precluded or impacted by another. Existing studies have attempted to address this issue by modeling the relationship between covariates and the distribution of first hit times for events of interest. However, popular parameter-based methods suffer from an overlooked flaw: competing risks are confounders that can mislead the model to learn spurious correlations between covariates and events of interest, resulting in performance degradation. Therefore, there is an urgent need to adjust survival analysis models to mitigate the bias of spurious correlations to obtain more accurate estimations.Methods: To address the problem of spurious correlations introduced by competing risks, we propose a novel paradigm: Causal Interventional Survival Analysis with competing risks. Specifically, by formalizing survival analysis under the framework of structural causal models (SCM), competing risks may introduce backdoor paths connecting covariates and events of interest, resulting in spurious correlations. Such backdoor paths can be effectively identified and removed through causal intervention - backdoor adjustment. In this way, only the true causal relations between covariates and events of interest are preserved and used for the probabilistic calculation, so that to yield accurate estimations. This solution is general and can be conveniently implemented and integrated into existing models, e.g., cs-Cox, Fine- Gray, Deep Survival Machine (DSM), DeepHit, and Dynamic-DeepHit, etc. The performance of our solution was evaluated on two inpatient datasets, MIMIC-IV and eICU, and an outpatient dataset, SEER, by performing five-fold cross-validation on each dataset. Specifically, the extracted MIMIC-IV dataset included 9,357 patients with 64 covariates and 11 competing risks; the eICU dataset included 15,731 patients with 46 covariates and 11 competing risks; the SEER dataset included 122,815 patients with 19 covariates and 3 competing risks. Each individual in the datasets had at least two competing risks. The clinical significance of the proposed solution was assessed in terms of Concordance-Index (CIndex), Net Reclassification Index (NRI), calibration and Decision Curve Analysis (DCA) of distinct competing events for all three datasets. In addition, model-agnostic Kernel SHAP (Shapley Additive Explanations) values were calculated for each covariate in the presence of each competing risk, to assess whether the causal intervention could reduce spurious correlations between covariates and events of interest.Findings: Overall, the five survival analysis models equipped with causal interventions were well calibrated and achieved significant performance gains as measured by C-Index (average performance gain of 4.66% - 11.85% for MIMIC-IV; 15.85% - 19.94% for eICU; 1.28% - 1.98% for SEER) and NRI (average improvement of 0.104 - 0.210 for MIMIC-IV; 0.068 - 0.431 for eICU; 0.014 - 0.026 for SEER) in all three datasets. The results of calibration and DCA also demonstrated the effectiveness of the proposed solution. Using Fine-Gray with/out causal intervention, the SHAP values obtained showed that casual intervention helps mitigate spurious correlations and reveal actual correlations between covariates and events of interest.Interpretation: We developed a debiasing solution for survival analysis with competing risks. Our solution learned a debiased model with causal intervention, conducting backdoor adjustment to remove spurious correlations introduced by risk confounders. Experimental results showed that the causal approach outperformed previous models in improving performance and reducing bias. The findings suggest that the debiasing solution has the potential to alleviate problems of existing models by removing/weakening the influence of covariates that are positively correlated with events of interest but not having any causal relations, and identifying covariates that are negatively correlated with events of interest but having true causal relations.Funding: We acknowledge support from the National Nature Science Foundation of China 61672450. GPUs provided by Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare.Declaration of Interest: We declare no competing interests.

Enabling Counterfactual Survival Analysis with Balanced Representations

Debiased machine learning for counterfactual survival functionals based on left-truncated right-censored data

Unbiased Prediction and Feature Selection in High-Dimensional Survival Regression

A Framework for Leveraging Machine Learning Tools to Estimate Personalized Survival Curves

Balanced Random Survival Forests for Extremely Unbalanced, Right Censored Data

Learning to rank for censored survival data

How Much Time to Survive under Competing Risks: A Causal Debiasing Paradigm

Proximal Causal Inference for Marginal Counterfactual Survival Curves

BITES: Balanced Individual Treatment Effect for Survival data

Causal survival analysis: A guide to estimating intention-to-treat and per-protocol effects from randomized clinical trials with non-adherence

Learning to Bound Counterfactual Inference from Observational, Biased and Randomised Data

Survivor average causal effects for continuous time: a principal stratification approach to causal inference with semicompeting risks

Impact of censoring on learning Bayesian networks in survival modelling

Copula-Based Deep Survival Models for Dependent Censoring

Deep Copula-Based Survival Analysis for Dependent Censoring with Identifiability Guarantees

Learning Decomposed Representation for Counterfactual Inference

Cycle-Balanced Representation Learning For Counterfactual Inference

A Bayesian Approach to Correct for Unmeasured or Semi-Unmeasured Confounding in Survival Data Using Multiple Validation Data Sets

Estimand-based Inference in Presence of Long-Term Survivors

A flexible approach for causal inference with multiple treatments and clustered survival outcomes

A unified framework for bounding causal effects on the always-survivor and other populations