Robust Real-Time Mortality Prediction in the Intensive Care Unit using Temporal Difference Learning

Thomas Frost,Kezhi Li,Steve Harris
2024-11-07
Abstract:The task of predicting long-term patient outcomes using supervised machine learning is a challenging one, in part because of the high variance of each patient's trajectory, which can result in the model over-fitting to the training data. Temporal difference (TD) learning, a common reinforcement learning technique, may reduce variance by generalising learning to the pattern of state transitions rather than terminal outcomes. However, in healthcare this method requires several strong assumptions about patient states, and there appears to be limited literature evaluating the performance of TD learning against traditional supervised learning methods for long-term health outcome prediction tasks. In this study, we define a framework for applying TD learning to real-time irregularly sampled time series data using a Semi-Markov Reward Process. We evaluate the model framework in predicting intensive care mortality and show that TD learning under this framework can result in improved model robustness compared to standard supervised learning methods. and that this robustness is maintained even when validated on external datasets. This approach may offer a more reliable method when learning to predict patient outcomes using high-variance irregular time series data.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to address the challenges of predicting the long - term mortality of patients in the intensive care unit (ICU). Specifically, the authors focus on the following key issues: 1. **Overfitting Caused by High - Variance Data**: The disease - course trajectories of ICU patients are highly complex and significantly different, which makes traditional supervised - learning - based methods prone to overfitting the training data. The performance of the model may drop significantly during external validation. 2. **Accuracy of Real - Time and Long - Term Predictions**: Most current prediction models are limited to one - time prediction at admission or short - term (≤72 hours) prediction, lacking the ability to make real - time, continuous predictions throughout the hospital stay. Moreover, existing models perform poorly during external validation. 3. **Limitations of Traditional Supervised - Learning Methods**: Although supervised - learning - based methods (such as gradient - boosting ensembles, artificial neural networks, etc.) show relatively high AUROC scores (0.80 - 0.95) in some cases, they often lack high - quality external validation and are unstable in long - term prediction. To solve these problems, the authors propose a new framework that utilizes **Temporal Difference (TD) learning** to handle irregularly - sampled time - series data. TD learning reduces variance through the bootstrapping method, thereby improving the generalization ability and robustness of the model. Specifically, the main contributions of this study include: - Proposing a TD - learning framework based on the Semi - Markov Reward Process (Semi - MRP). - Using the MIMIC - IV and Salzburg ICU datasets for internal and external validation, demonstrating the superior performance of the TD - learning model in long - term mortality prediction. - Experimental results show that the TD - learning model not only performs excellently in internal validation but also maintains high accuracy and stability in external validation. Through these improvements, this study provides a more reliable method for predicting the long - term mortality of ICU patients, especially when dealing with high - variance and irregular time - series data.