Development and validation of a machine learning-augmented algorithm for diabetes screening in community and primary care settings: A population-based study

XiaoHuan Liu,Weiyue Zhang,Qiao Zhang,Long Chen,TianShu Zeng,JiaoYue Zhang,Jie Min,ShengHua Tian,Hao Zhang,Hantao Huang,Ping Wang,Xiang Hu,LuLu Chen
DOI: https://doi.org/10.3389/fendo.2022.1043919
IF: 6.055
2022-01-01
Frontiers in Endocrinology
Abstract:BackgroundOpportunely screening for diabetes is crucial to reduce its related morbidity, mortality, and socioeconomic burden. Machine learning (ML) has excellent capability to maximize predictive accuracy. We aim to develop ML-augmented models for diabetes screening in community and primary care settings. Methods8425 participants were involved from a population-based study in Hubei, China since 2011. The dataset was split into a development set and a testing set. Seven different ML algorithms were compared to generate predictive models. Non-laboratory features were employed in the ML model for community settings, and laboratory test features were further introduced in the ML+lab models for primary care. The area under the receiver operating characteristic curve (AUC), area under the precision-recall curve (auPR), and the average detection costs per participant of these models were compared with their counterparts based on the New China Diabetes Risk Score (NCDRS) currently recommended for diabetes screening. ResultsThe AUC and auPR of the ML model were 0 center dot 697and 0 center dot 303 in the testing set, seemingly outperforming those of NCDRS by 10 center dot 99% and 64 center dot 67%, respectively. The average detection cost of the ML model was 12 center dot 81% lower than that of NCDRS with the same sensitivity (0 center dot 72). Moreover, the average detection cost of the ML+FPG model is the lowest among the ML+lab models and less than that of the ML model and NCDRS+FPG model. ConclusionThe ML model and the ML+FPG model achieved higher predictive accuracy and lower detection costs than their counterpart based on NCDRS. Thus, the ML-augmented algorithm is potential to be employed for diabetes screening in community and primary care settings.
What problem does this paper attempt to address?