Application of machine learning algorithms to identify people with low bone density
Rongxuan Xu,Yongxing Chen,Zhihan Yao,Wei Wu,Jiaxue Cui,Ruiqi Wang,Yizhuo Diao,Chenxin Jin,Zhijun Hong,Xiaofeng Li
DOI: https://doi.org/10.3389/fpubh.2024.1347219
IF: 5.2
2024-04-26
Frontiers in Public Health
Abstract:Background: Osteoporosis is becoming more common worldwide, imposing a substantial burden on individuals and society. The onset of osteoporosis is subtle, early detection is challenging, and population-wide screening is infeasible. Thus, there is a need to develop a method to identify those at high risk for osteoporosis. Objective: This study aimed to develop a machine learning algorithm to effectively identify people with low bone density, using readily available demographic and blood biochemical data. Methods: Using NHANES 2017–2020 data, participants over 50 years old with complete femoral neck BMD data were selected. This cohort was randomly divided into training (70%) and test (30%) sets. Lasso regression selected variables for inclusion in six machine learning models built on the training data: logistic regression (LR), support vector machine (SVM), gradient boosting machine (GBM), naive Bayes (NB), artificial neural network (ANN) and random forest (RF). NHANES data from the 2013–2014 cycle was used as an external validation set input into the models to verify their generalizability. Model discrimination was assessed via AUC, accuracy, sensitivity, specificity, precision and F1 score. Calibration curves evaluated goodness-of-fit. Decision curves determined clinical utility. The SHAP framework analyzed variable importance. Results: A total of 3,545 participants were included in the internal validation set of this study, of whom 1870 had normal bone density and 1,675 had low bone density Lasso regression selected 19 variables. In the test set, AUC was 0.785 (LR), 0.780 (SVM), 0.775 (GBM), 0.729 (NB), 0.771 (ANN), and 0.768 (RF). The LR model has the best discrimination and a better calibration curve fit, the best clinical net benefit for the decision curve, and it also reflects good predictive power in the external validation dataset The top variables in the LR model were: age, BMI, gender, creatine phosphokinase, total cholesterol and alkaline phosphatase. Conclusion: The machine learning model demonstrated effective classification of low BMD using blood biomarkers. This could aid clinical decision making for osteoporosis prevention and management.
public, environmental & occupational health