Application of machine learning algorithm incorporating dietary intake in prediction of gestational diabetes mellitus
Tianze Ding,Peijie Liu,Jie Jia,Hui Wu,Jie Zhu,Kefeng Yang
DOI: https://doi.org/10.1530/ec-24-0169
2024-10-10
Endocrine Connections
Abstract:Introduction: Gestational diabetes mellitus (GDM) significantly affects pregnancy outcomes. Therefore, it is crucial to develop prediction models since they can guide timely interventions to reduce the incidence of GDM and its associated adverse effects. Methods: A total of 554 pregnant women were selected and their sociodemographic characteristics, clinical data and dietary data were collected. Dietary data was investigated by a validated semi-quantitative food frequency questionnaire (FFQ). We applied random forest mean decrease impurity for feature selection and the models are built using Logistic Regression, XGBoost, and LightGBM algorithms. The prediction performance of different models was compared by Accuracy, Sensitivity, Specificity, Area Under Curve (AUC) and Hosmer-Lemeshow test. Results: Blood glucose, age, pre-pregnancy body mass index (BMI), triglycerides and high-density lipoprotein cholesterol (HDL) were the top five features according to the feature selection. Among the three algorithms, XGBoost performed best with an AUC of 0.788, LightGBM came second (AUC = 0.749), and Logistic Regression performed the worst (AUC = 0.712). In addition, XGBoost and LightGBM both achieved a fairly good performance when dietary information was included, surpassing their performance on the non-dietary dataset (0.788 vs. 0.718 in XGBoost; 0.749 vs. 0.726 in LightGBM). Conclusion: XGBoost and LightGBM algorithms outperform Logistic Regression in predicting GDM among the Chinese pregnant women. In addition, dietary data may have a positive effect on improving model performance, which deserves more in-depth investigation with larger sample size.
endocrinology & metabolism
What problem does this paper attempt to address?