Prediction under interventions: evaluation of counterfactual performance using longitudinal observational data

Ruth H. Keogh,Nan van Geloven
DOI: https://doi.org/10.48550/arXiv.2304.10005
2024-01-11
Abstract:Predictions under interventions are estimates of what a person's risk of an outcome would be if they were to follow a particular treatment strategy, given their individual characteristics. Such predictions can give important input to medical decision making. However, evaluating predictive performance of interventional predictions is challenging. Standard ways of evaluating predictive performance do not apply when using observational data, because prediction under interventions involves obtaining predictions of the outcome under conditions that are different to those that are observed for a subset of individuals in the validation dataset. This work describes methods for evaluating counterfactual performance of predictions under interventions for time-to-event outcomes. This means we aim to assess how well predictions would match the validation data if all individuals had followed the treatment strategy under which predictions are made. We focus on counterfactual performance evaluation using longitudinal observational data, and under treatment strategies that involve sustaining a particular treatment regime over time. We introduce an estimation approach using artificial censoring and inverse probability weighting which involves creating a validation dataset that mimics the treatment strategy under which predictions are made. We extend measures of calibration, discrimination (c-index and cumulative/dynamic AUCt) and overall prediction error (Brier score) to allow assessment of counterfactual performance. The methods are evaluated using a simulation study, including scenarios in which the methods should detect poor performance. Applying our methods in the context of liver transplantation shows that our procedure allows quantification of the performance of predictions supporting crucial decisions on organ allocation.
Methodology
What problem does this paper attempt to address?
### Problems the paper attempts to solve The paper aims to solve the problem of evaluating predictive performance under interventions. Specifically, researchers are concerned with how to use longitudinal observational data to assess the accuracy of individual risk prediction under specific treatment strategies. Standard methods for evaluating predictive performance are not applicable in this scenario because the conditions involved in the prediction are different from those actually observed in some individuals in the validation dataset. Therefore, the paper proposes a new method. By means of artificial censoring and inverse probability weighting techniques, a validation dataset simulating a specific treatment strategy is created, enabling the evaluation of the counterfactual performance of the prediction. ### Key point summary 1. **Background**: - Predicting an individual's risk under a specific treatment strategy is crucial for medical decision - making. - Standard prediction models cannot provide this information because they are usually based on the observed outcome distribution. - Longitudinal observational data (such as electronic health records) are the main data sources for developing these prediction models, but confounding factors need to be dealt with. 2. **Challenges**: - The counterfactual outcomes of individuals under different treatment strategies cannot be directly observed. - When using observational data to evaluate predictive performance, standard methods are not feasible because the prediction conditions are different from the actual observation conditions. 3. **Solutions**: - A method based on artificial censoring and inverse probability weighting is proposed to generate a validation dataset that simulates a specific treatment strategy. - Evaluation metrics for calibration, discrimination (such as the c - index and cumulative/dynamic AUCt), and overall prediction error (such as the Brier score) are extended to evaluate counterfactual performance. 4. **Methods**: - **Artificial censoring**: In the validation data, when an individual deviates from a specific treatment strategy, their follow - up time is censored. - **Inverse probability weighting**: Each individual is weighted so that they represent the situation where all individuals follow a specific treatment strategy. - **Performance evaluation**: Weighted Kaplan - Meier analysis and weighted c - index, AUCt, and Brier score are used to evaluate predictive performance. 5. **Application and validation**: - The effectiveness of the proposed method was verified through simulation studies. - It was applied to liver transplantation data to show how to evaluate predictive performance under different treatment strategies. ### Formula examples - **Inverse probability censoring weight (IPACW)**: \[ G^{-1}_{a_0}(t|\mathbf{L})=\prod_{s = 0}^{\lfloor t\rfloor}\left(\frac{1}{\Pr(A_s=a_s|\bar{A}_{s - 1}=\bar{a}_{s - 1},\bar{L}_s)}\right) \] - **Weighted Brier score**: \[ \hat{BS}_{a_0}(t)=\frac{1}{n}\sum_{i = 1}^{n}\left(I(\tilde{T}_{a_0i}\leq t)-\hat{R}_{a_0i}(t|\mathbf{X}_i)\right)^2W^{(2)}_{a_0i} \] where, \[ W^{(2)}_{a_0i}=\frac{I(\tilde{T}_{a_0i}\leq t,\tilde{D}_{a_0i}=1)}{\hat{G}_{a_0c}(\tilde{T}_{a_0i}|\mathbf{L}_i)}+\frac{I(\tilde{T}_{a_0i}>t)}{\hat{G}_{a_0c}(t|\mathbf{L}_i)} \] ### Conclusion The paper proposes a new method that can evaluate the predictive performance under specific treatment strategies in longitudinal observational data. This method creates a validation dataset that simulates a specific treatment strategy through artificial censoring and inverse probability weighting techniques and extends existing performance evaluation metrics. Through simulation studies and practical applications, the effectiveness and practicality of this method are proven. This provides important support for medical decision - making, especially in...