Development and Validation of Risk Prediction Models for Large for Gestational Age Infants Using Logistic Regression and Two Machine Learning Algorithms

Ning Wang,Haonan Guo,Yingyu Jing,Yifan Zhang,Bo Sun,Xingyan Pan,Huan Chen,Jing Xu,Mengjun Wang,Xi Chen,Lin Song,Wei Cui
DOI: https://doi.org/10.1111/1753-0407.13375
IF: 4.53
2023-01-01
Journal of Diabetes
Abstract:Background: Large for gestational age (LGA) is one of the adverse outcomes during pregnancy that endangers the life and health of mothers and offspring. We aimed to establish prediction models for LGA at late pregnancy. Methods: Data were obtained from an established Chinese pregnant women cohort of 1285 pregnant women. LGA was diagnosed as >90th percentile of birth weight distribution of Chinese corresponding to gestational age of the same-sex newborns. Women with gestational diabetes mellitus (GDM) were classified into three subtypes according to the indexes of insulin sensitivity and insulin secretion. Models were established by logistic regression and decision tree/random forest algorithms, and validated by the data. Results: A total of 139 newborns were diagnosed as LGA after birth. The area under the curve (AUC) for the training set is 0.760 (95% confidence interval [CI] 0.706-0.815), and 0.748 (95% CI 0.659-0.837) for the internal validation set of the logistic regression model, which consisted of eight commonly used clinical indicators (including lipid profile) and GDM subtypes. For the prediction models established by the two machine learning algorithms, which included all the variables, the training set and the internal validation set had AUCs of 0.813 (95% CI 0.786-0.839) and 0.779 (95% CI 0.735-0.824) for the decision tree model, and 0.854 (95% CI 0.831-0.877) and 0.808 (95% CI 0.766-0.850) for the random forest model. Conclusion: We established and validated three LGA risk prediction models to screen out the pregnant women with high risk of LGA at the early stage of the third trimester, which showed good prediction power and could guide early prevention strategies.
What problem does this paper attempt to address?