A Kalman Filter Based Framework for Monitoring the Performance of In-Hospital Mortality Prediction Models Over Time

Jiacheng Liu, Lisa Kirkland, Jaideep Srivastava
2024-02-10
Abstract:Unlike in a clinical trial, where researchers get to determine the least number of positive and negative samples required, or in a machine learning study where the size and the class distribution of the validation set is static and known, in a real-world scenario, there is little control over the size and distribution of incoming patients. As a result, when measured during different time periods, evaluation metrics like Area under the Receiver Operating Curve (AUCROC) and Area Under the Precision-Recall Curve(AUCPR) may not be directly comparable. Therefore, in this study, for binary classifiers running in a long time period, we proposed to adjust these performance metrics for sample size and class distribution, so that a fair comparison can be made between two time periods. Note that the number of samples and the class distribution, namely the ratio of positive samples, are two robustness factors which affect the variance of AUCROC. To better estimate the mean of performance metrics and understand the change of performance over time, we propose a Kalman filter based framework with extrapolated variance adjusted for the total number of samples and the number of positive samples during different time periods. The efficacy of this method is demonstrated first on a synthetic dataset and then retrospectively applied to a 2-days ahead in-hospital mortality prediction model for COVID-19 patients during 2021 and 2022. Further, we conclude that our prediction model is not significantly affected by the evolution of the disease, improved treatments and changes in hospital operational plans.
Machine Learning
What problem does this paper attempt to address?
This paper discusses how to monitor and predict the performance of models over time in a hospital setting, particularly for binary classification models such as predicting inpatient mortality rates. The study proposes a framework based on Kalman filtering to adjust performance metrics for changes in sample size and class distribution, allowing for fair comparisons between different time periods. The paper points out that in clinical trials or machine learning research, model performance is typically evaluated on a static and known validation set. However, in the real world, the number of newly hospitalized patients and their class distribution may continuously change, making it difficult to directly compare evaluation metrics such as AUCROC (area under the receiver operating characteristic curve). Therefore, they propose a one-dimensional Kalman filter framework that adjusts the variance of the next time step by considering the sample size and the proportion of positive samples, providing better estimates of the mean of performance metrics and understanding the performance changes over time. The paper demonstrates the effectiveness of the method using synthetic datasets and a retrospective application of COVID-19 inpatient mortality prediction models for the years 2021 and 2022. The results show that despite disease variability, treatment improvements, and changes in hospital operational plans, the performance of the prediction models remains stable. The study also discusses potential issues when monitoring the performance of prediction models, such as setting p-value thresholds and multiple comparisons. It suggests that the proposed framework can mitigate these problems. Finally, the paper suggests future work should consider applying filtering to AUCPR (area under the precision-recall curve), as sample size and class distribution may be dominant factors influencing the robustness of AUCROC for AUCPR.