A Machine Learning-Based Diagnosis Modelling of Type 2 Diabetes Mellitus with Environmental Metal Exposure

Min Zhao,Jin Wan,Wenzhi Qin,Xin Huang,Guangdi Chen,Xinyuan Zhao
DOI: https://doi.org/10.1016/j.cmpb.2023.107537
IF: 6.1
2023-01-01
Computer Methods and Programs in Biomedicine
Abstract:Background and Objective: Increasing and compelling evidence has been proved that urinary and dietary metal exposure are underappreciated but potentially modifiable biomarkers for type 2 diabetes mellitus (T2DM). The aims of this study were (1) to identify the key potential biomarkers which contributed to T2DM with effective and parsimonious features and (2) to assess the utility of baseline variables and metal exposure in the diagnosis of T2DM. Methods: Based on the National Health and Nutrition Examination Survey (NHANES), we selected 9822 screening records with 82 significant variables covering demographics, lifestyle, anthropometric measures, diet and metal exposure for this study. Combining extreme gradient boosting (XGBoost), random forest and light gradient boosting machine (lightGBM), a soft voting ensemble model was proposed to mea-sure the importance of 82 features. With this soft voting ensemble model and variance inflation factor (VIF), strong multicollinear features with low importance scores were further removed from candidate biomarkers. Then, a soft voting ensemble classifier was adopted to demonstrate the efficiency of the pro-posed feature selection method.Results: With the novel feature selection method, 12 baseline variables and 3 metal variables were se-lected to detect patients at risk for T2DM in our study. For metal variables, the dietary copper (Cu), urinary cadmium (Cd) and urinary mercury (Hg) metals were selected as the most remarkable metal ex-posure and the corresponding P-values were all less than 0.05. In a classification model of T2DM with 12 baseline biomarkers, the addition of 3 metal exposure improved the classification accuracy of T2DM from a traditional area under the curve (AUC) 0.792 of the receiver operating characteristic (ROC) to an AUC 0.847.Conclusions: This was the first demonstration of T2DM classification with machine learning under urinary and dietary metal exposure. Improved prediction precision illustrated the effectiveness of the proposed machine learning-based diagnosis model facilitated lifestyle/dietary intervention for T2DM prevention.(c) 2023 Elsevier B.V. All rights reserved.
What problem does this paper attempt to address?