A Comparative Analysis of Statistical and Machine Learning Models

Shannon Holcroft,Innocent Karangwa,Francesca Little,Joelle Behoor,Oliva Bazirete
DOI: https://doi.org/10.3390/ijerph21050600
IF: 4.614
2024-05-08
International Journal of Environmental Research and Public Health
Abstract:Postpartum haemorrhage (PPH) is a significant cause of maternal morbidity and mortality worldwide, particularly in low-resource settings. This study aimed to develop a predictive model for PPH using early risk factors and rank their importance in terms of predictive ability. The dataset was obtained from an observational case–control study in northern Rwanda. Various statistical models and machine learning techniques were evaluated, including logistic regression, logistic regression with elastic-net regularisation, Random Forests, Extremely Randomised Trees, and gradient-boosted trees with XGBoost. The Random Forest model, with an average sensitivity of 80.7%, specificity of 71.3%, and a misclassification rate of 12.19%, outperformed the other models, demonstrating its potential as a reliable tool for predicting PPH. The important predictors identified in this study were haemoglobin level during labour and maternal age. However, there were differences in PPH risk factor importance in different data partitions, highlighting the need for further investigation. These findings contribute to understanding PPH risk factors, highlight the importance of considering different data partitions and implementing cross-validation in predictive modelling, and emphasise the value of identifying the appropriate prediction model for the application. Effective PPH prediction models are essential for improving maternal health outcomes on a global scale. This study provides valuable insights for healthcare providers to develop predictive models for PPH to identify high-risk women and implement targeted interventions.
public, environmental & occupational health,environmental sciences
What problem does this paper attempt to address?