Development and validation of machine learning-based prediction model for severe pneumonia: A multicenter cohort study

Zailin Yang,Shuang Chen,Xinyi Tang,Jiao Wang,Ling Liu,Weibo Hu,Yulin Huang,Jian'e Hu,Xiangju Xing,Yakun Zhang,Jun Li,Haike Lei,Yao Liu
DOI: https://doi.org/10.1016/j.heliyon.2024.e37367
IF: 3.776
2024-09-03
Heliyon
Abstract:Severe pneumonia (SP) is a prevalent respiratory ailment characterized by high mortality and poor prognosis. Current scoring systems for pneumonia are not only time-consuming but also exhibit limitations in early SP prediction. To address this gap, this study aimed to develop a machine-learning model using inflammatory markers from peripheral blood for early prediction of SP. A total of 204 pneumonia patients from seven medical centers were studied, with 143 (68 SP cases) in the training cohort and 61 (32 SP cases) in the test cohort. Clinical characteristics and laboratory test results were collected at diagnosis. Various models including Logistic Regression, Random Forest, Naïve Bayes, XGBoost, Support Vector Machine, and Decision Tree were built and evaluated. Seven predictors-age, sex, WBC count, T-lymphocyte count, NLR, CRP, TNF-α, IL-4/IFN-γ ratio, IL-6/IL-10 ratio-were selected through LASSO regression and clinical insight. The XGBoost model, exhibiting best performance, achieved an AUC of 0.901 (95 % CI: 0.827 to 0.985) in the test cohort, with an accuracy of 0.803, sensitivity of 0.844, specificity of 0.759, and F1_score of 0.818. Indeed, SHAP analysis emphasized the significance of elevated WBC counts, older age, and elevated CRP as the top predictors. The use of inflammatory biomarkers in this concise predictive model shows significant potential for the rapid assessment of SP risk, thereby facilitating timely preventive interventions.
What problem does this paper attempt to address?