A scoping review of machine learning for sepsis prediction- feature engineering strategies and model performance: a step towards explainability

Sherali Bomrah,Mohy Uddin,Umashankar Upadhyay,Matthieu Komorowski,Jyoti Priya,Eshita Dhar,Shih-Chang Hsu,Shabbir Syed-Abdul
DOI: https://doi.org/10.1186/s13054-024-04948-6
IF: 15.1
2024-05-29
Critical Care
Abstract:Sepsis, an acute and potentially fatal systemic response to infection, significantly impacts global health by affecting millions annually. Prompt identification of sepsis is vital, as treatment delays lead to increased fatalities through progressive organ dysfunction. While recent studies have delved into leveraging Machine Learning (ML) for predicting sepsis, focusing on aspects such as prognosis, diagnosis, and clinical application, there remains a notable deficiency in the discourse regarding feature engineering. Specifically, the role of feature selection and extraction in enhancing model accuracy has been underexplored.
critical care medicine
What problem does this paper attempt to address?
The main problem this paper attempts to address is the feature engineering strategies in predicting sepsis through machine learning (ML) techniques and their impact on model performance. Specifically, the paper has two main objectives: 1. **Identify Key Features**: Determine the key features used in various machine learning models for predicting sepsis, providing valuable insights for future model development. 2. **Evaluate Model Performance**: Assess the performance of these models using metrics such as AUROC (Area Under the Receiver Operating Characteristic Curve), sensitivity, and specificity. ### Background Sepsis is an acute and potentially fatal systemic response triggered by infection, affecting millions of people annually and leading to a significant number of deaths. Timely identification of sepsis is crucial, as delayed treatment can lead to gradual organ function deterioration, thereby increasing mortality rates. Although many studies in recent years have focused on using machine learning techniques to predict sepsis, research on feature engineering is relatively scarce, particularly the role of feature selection and extraction in improving model accuracy has not been fully explored. ### Objectives 1. **Explore Feature Engineering Strategies**: Analyze the feature engineering strategies used in machine learning models for sepsis prediction, providing valuable information for future research and model development. 2. **Evaluate Model Performance**: Critically analyze existing studies to evaluate the performance of these models, focusing on metrics such as AUROC, sensitivity, and specificity. ### Methods - **Literature Search Strategy**: A comprehensive literature search was conducted in PubMed, Embase, and Scopus databases according to PRISMA guidelines, screening relevant studies from the past 5 years. - **Inclusion and Exclusion Criteria**: Included studies were those published in English, in peer-reviewed journals, focusing on sepsis prediction, particularly those emphasizing feature optimization in machine learning models. Excluded were conference abstracts, preliminary proof-of-concept studies, and studies predicting only sepsis-related mortality. - **Data Extraction and Quality Assessment**: Two primary reviewers extracted key information, including study objectives, clinical settings, patient cohort size, machine learning models used, number of features, observation period, gender distribution, AUROC, innovation, and model evaluation criteria. Two additional reviewers reviewed and validated the extracted information. ### Results - **Study Characteristics**: A total of 29 studies were included, covering 1,147,202 patients. These studies were primarily conducted in various clinical settings such as Intensive Care Units (ICU) and Emergency Departments (ED), using multiple database sources. - **Feature Engineering Techniques**: - **Feature Selection Methods**: Included filter methods, wrapper methods, and embedded methods. Filter methods selected features through variable ranking techniques, wrapper methods evaluated subsets through model performance, and embedded methods integrated the feature selection process directly into model training. - **Feature Extraction Methods**: Utilized LSTM networks to extract features from time-series data, and developed second-order derivative features and aggregated features to capture complex relationships and compress data. ### Conclusion - **Key Dynamic Indicators**: Vital signs and key laboratory values are crucial for early detection of sepsis. - **Feature Selection Methods**: Applying feature selection methods significantly improved model accuracy, with models like Random Forest and XGBoost showing good results. - **Deep Learning Models**: Revealed the important role of feature engineering in sepsis prediction, greatly improving clinical practice. Through this comprehensive review, the paper aims to provide a systematic understanding of feature engineering for sepsis prediction models, thereby promoting more effective clinical decision-making and patient care.