A machine learning model predicts stroke associated with blood cadmium level

Wenwei Zuo,Xuelian Yang
DOI: https://doi.org/10.1038/s41598-024-65633-w
2024-06-26
Abstract:Stroke is the leading cause of death and disability worldwide. Cadmium is a prevalent environmental toxicant that may contribute to cardiovascular disease, including stroke. We aimed to build an effective and interpretable machine learning (ML) model that links blood cadmium to the identification of stroke. Our data exploring the association between blood cadmium and stroke came from the National Health and Nutrition Examination Survey (NHANES, 2013-2014). In total, 2664 participants were eligible for this study. We divided these data into a training set (80%) and a test set (20%). To analyze the relationship between blood cadmium and stroke, a multivariate logistic regression analysis was performed. We constructed and tested five ML algorithms including K-nearest neighbor (KNN), decision tree (DT), logistic regression (LR), multilayer perceptron (MLP), and random forest (RF). The best-performing model was selected to identify stroke in US adults. Finally, the features were interpreted using the Shapley Additive exPlanations (SHAP) tool. In the total population, participants in the second, third, and fourth quartiles had an odds ratio of 1.32 (95% CI 0.55, 3.14), 1.65 (95% CI 0.71, 3.83), and 2.67 (95% CI 1.10, 6.49) for stroke compared with the lowest reference group for blood cadmium, respectively. This blood cadmium-based LR approach demonstrated the greatest performance in identifying stroke (area under the operator curve: 0.800, accuracy: 0.966). Employing interpretable methods, we found blood cadmium to be a notable contributor to the predictive model. We found that blood cadmium was positively correlated with stroke risk and that stroke risk from cadmium exposure could be effectively predicted by using ML modeling.
What problem does this paper attempt to address?