Machine learning prediction models for depression symptoms among Chinese health care workers during the COVID-19 outbreak (Preprint)

Zhaohe Zhou,Dan Luo,Bing Xiang Yang,Zhongchun Liu
DOI: https://doi.org/10.2196/preprints.36814
2022-01-01
Abstract:UNSTRUCTURED Background: The COVID-19 related depression symptoms of health care workers have received world-wide recognition. Although many studies identified risk exposures associated with depression symptoms among health care workers, few have focused on a predictive model using machine learning methods. As a society, governments and organizations are concerned about the need for immediate interventions and alert systems for health care workers who are mentally at-risk.This study aims to develop and validate machine learning-based models for predicting depression symptoms using survey data collected during the COVID-19 outbreak in China. Method: Surveys were conducted of 2,574 health care workers in hospitals designated to care for COVID-19 patients. The outcome measure was a score of >=5 on the Patient Health Questionnaire. Descriptive statistics were used to describe the data. Four machine learning approaches were developed (75% of data) and validated (25% of data) using cross-validation with 100 repetitions to identify important predictors for depression symptoms. Finally, all models were compared to evaluate their predictive performances and screening utility: (1) decision tree, (2) logistics regression with least absolute shrinkage and selection operator (LASSO), (3) random forest, and (4) gradient-boosting tree. Results: Descriptive statistics showed that 46.11% of health care workers experienced depression symptoms. Important risk predictors identified and ranked by the machine learning models were highly consistent: self-perceived health status factors always occupied the top five most important predictors, followed by worried about infection, working on the frontline, a very high level of uncertainty, having received any form of psychological support material and having COVID-like symptoms. The C-statistics [95% CI] of machine learning models were as follows: LASSO model, 0.824 [0.792-0.856]; random forest, 0.828 [0.797-0.859]; gradient-boosting tree, 0.829 [0.798-0.861]; and decision tree, 0.785 [0.752-0.819]. The calibration plot indicated that the LASSO model, random forest, and gradient-boosting tree fit the data well. Decision curve analysis showed that all models obtained net benefits for predicting depression symptoms. Conclusions: This study shows that machine learning prediction models are suitable for making predictions about mentally at-risk health care workers predictions in a public health emergency setting. The application of machine learning models could support hospitals’ and health care workers’ decision-making on possible psychological interventions and proper mental health management; However, their substitution over conventional tools in general diagnostic settings requires further development and comparison.
What problem does this paper attempt to address?