Machine-Learning-Enabled Prognostic Models for Sepsis

Chunyan Li,Lu Wang,Kexun Li,Hongfei Deng,Yu Wang,Li Chang,Ping Zhou,Jun Zeng,Mingwei Sun,Hua Jiang,Qi Wang
DOI: https://doi.org/10.2139/ssrn.4336961
2024-01-01
Intelligence-Based Medicine
Abstract:Background and Objectives: Sepsis is one of the common causes of death in intensive care units. A reliable prognostic model based on patients' data acquired at the intensive care unit (ICU) would enable clinicians to make treatment decisions to improve clinical outcomes for septic patients. This study aims to develop a machine-learning framework for building such prognostic tools by exploring the class-imbalanced longitudinal data of a group of septic patients.Methods: A feature-represented input dataset is devised in the form of concatenated triples to increase the data size relative to the dimension of the feature space. Each concatenated triplet consists of a patient's static data, the k-day consecutively collected longitudinal data, and the clinical outcome (k=2,3,4,5). The structured input data are then used to train classifiers in combination with appropriate feature engineering techniques. The trained classifiers are tested on a new set of septic patients to ensure their clinical efficacy. We implement the modeling approach using five classifiers: K nearest neighbors, Logistic Regression, Support Vector Machine, Random Forest (RF), and Extreme Gradient Boosting (XGBoost) coupled with a set of feature engineering techniques. AUROC and a new metric, $\gamma$, made up of the F1 score on the external validation set, are used to assess the efficacy of the models.Results: Five prognostic models are built on the feature-represented input dataset accounting for 10 selected dynamic features from the patient medical data. Our research shows that the XGBoost (AUROC=0.777, F1 score=0.694) and RF (AUROC=0.769, F1 score=0.647) model combined with the ensemble under-sampling strategy outperform all other models in the external validation or testing. For example, the improvement in AUROC and overfitting are (6.66\%, 54.96\%) and (0.52\%, 77.72\%) for the RF and XGBoost model with the sampling strategy compared to the same models without using the sampling strategy, respectively. This indicates that the machine-learning framework can greatly improve the accuracy and generalizability of standard classifiers. Conclusion:A new modeling framework is devised to develop prognostic tools for treatment outcomes of septic patients using small, class-imbalanced, and high-dimensional datasets. It enables standard classifiers to use small datasets to achieve relatively high predictability by engineering new structured datasets encoded with temporal features, sampling strategies, and dimension reduction techniques, providing clinically useful prognostic models and setting an example for applying machine learning methods to small data problems in medicine.
What problem does this paper attempt to address?