Predicting the risk of subclinical atherosclerosis based on interpretable machine models in a Chinese T2DM population

Ximisinuer Tusongtuoheti,Yimeng Shu,Guoqing Huang,Yushan Mao
DOI: https://doi.org/10.3389/fendo.2024.1332982
IF: 6.055
2024-02-28
Frontiers in Endocrinology
Abstract:Background: Cardiovascular disease (CVD) has emerged as a global public health concern. Identifying and preventing subclinical atherosclerosis (SCAS), an early indicator of CVD, is critical for improving cardiovascular outcomes. This study aimed to construct interpretable machine learning models for predicting SCAS risk in type 2 diabetes mellitus (T2DM) patients. Methods: This study included 3084 T2DM individuals who received health care at Zhenhai Lianhua Hospital, Ningbo, China, from January 2018 to December 2022. The least absolute shrinkage and selection operator combined with random forest-recursive feature elimination were used to screen for characteristic variables. Linear discriminant analysis, logistic regression, Naive Bayes, random forest, support vector machine, and extreme gradient boosting were employed in constructing risk prediction models for SCAS in T2DM patients. The area under the receiver operating characteristic curve (AUC) was employed to assess the predictive capacity of the model through 10-fold cross-validation. Additionally, the SHapley Additive exPlanations were utilized to interpret the best-performing model. Results: The percentage of SCAS was 38.46% (n=1186) in the study population. Fourteen variables, including age, white blood cell count, and basophil count, were identified as independent risk factors for SCAS. Nine predictors, including age, albumin, and total protein, were screened for the construction of risk prediction models. After validation, the random forest model exhibited the best clinical predictive value in the training set with an AUC of 0.729 (95% CI: 0.709-0.749), and it also demonstrated good predictive value in the internal validation set [AUC: 0.715 (95% CI: 0.688-0.742)]. The model interpretation revealed that age, albumin, total protein, total cholesterol, and serum creatinine were the top five variables contributing to the prediction model. Conclusion: The construction of SCAS risk models based on the Chinese T2DM population contributes to its early prevention and intervention, which would reduce the incidence of adverse cardiovascular prognostic events.
endocrinology & metabolism
What problem does this paper attempt to address?