Machine Learning Models in Type 2 Diabetes Risk Prediction: Results from a Cross-sectional Retrospective Study in Chinese Adults

Xiao-lu Xiong,Rong-xin Zhang,Yan Bi,Wei-hong Zhou,Yun Yu,Da-long Zhu
DOI: https://doi.org/10.1007/s11596-019-2077-4
2019-07-25
Current Medical Science
Abstract:SummaryType 2 diabetes mellitus (T2DM) has become a prevalent health problem in China, especially in urban areas. Early prevention strategies are needed to reduce the associated mortality and morbidity. We applied the combination of rules and different machine learning techniques to assess the risk of development of T2DM in an urban Chinese adult population. A retrospective analysis was performed on 8000 people with non-diabetes and 3845 people with T2DM in Nanjing. Multilayer Perceptron (MLP), AdaBoost (AD), Trees Random Forest (TRF), Support Vector Machine (SVM), and Gradient Tree Boosting (GTB) machine learning techniques with 10 cross validation methods were used with the proposed model for the prediction of the risk of development of T2DM. The performance of these models was evaluated with accuracy, precision, sensitivity, specificity, and area under receiver operating characteristic (ROC) curve (AUC). After comparison, the prediction accuracy of the different five machine models was 0.87, 0.86, 0.86, 0.86 and 0.86 respectively. The combination model using the same voting weight of each component was built on T2DM, which was performed better than individual models. The findings indicate that, combining machine learning models could provide an accurate assessment model for T2DM risk prediction.
medicine, research & experimental
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: among the urban adult population in China, through a retrospective cross - sectional study, use a machine - learning model to predict the risk of type 2 diabetes mellitus (T2DM). Specifically, the study aims to develop a T2DM risk - prediction model suitable for the urban population in China to help with the early prevention and diagnosis of T2DM, thereby reducing its incidence and related complications. ### Research Background and Problems 1. **Epidemic Trend of T2DM** - T2DM is a major health problem worldwide, especially in rapidly developing countries such as China. With economic growth, the incidence of T2DM has increased significantly. - The incidence of T2DM in urban areas in China is significantly higher than that in rural areas, which has brought a huge burden to the medical system. 2. **Importance of Early Prevention** - Early screening and diagnosis are crucial for the effective prevention of T2DM and can significantly reduce the risk of incidence and complications. - Current clinical prediction methods may not be fully applicable to patients in different regions, especially considering ethnic specificity and differences in economic development. 3. **Application of Machine Learning** - Machine - learning techniques have shown great potential in the medical field, especially in the diagnosis and prediction of diabetes. - This study used multiple machine - learning models (such as Multi - Layer Perceptron MLP, AdaBoost, Random Forest RF, Support Vector Machine SVM, and Gradient Boosting Tree GB), combined with rules and data - mining techniques, to evaluate and predict the risk of T2DM. ### Research Objectives - Develop a machine - learning - based T2DM risk - prediction model that can accurately assess the risk of T2DM in the urban adult population in China. - Compare the performance of different machine - learning models and construct a comprehensive model to improve the accuracy of prediction. ### Method Overview - **Data Collection**: Data of 8,000 non - diabetic patients and 3,845 T2DM patients were collected from Nanjing Gulou Hospital. - **Feature Selection**: 11 indicators highly related to T2DM were selected, including gender, age, BMI, systolic blood pressure, diastolic blood pressure, glycated hemoglobin (HbA1c), triglycerides, total cholesterol, high - density lipoprotein cholesterol, low - density lipoprotein cholesterol, and fasting blood glucose. - **Model Construction and Evaluation**: Five machine - learning models were used for training, and the model performance was evaluated by 10 - fold cross - validation, and finally an integrated model was constructed. ### Results - The prediction accuracy of various machine - learning models was similar and all reached a relatively high level (about 0.86 - 0.87). - The AUC value of the comprehensive model was 0.97, showing higher prediction accuracy. - This model also performed well in terms of sensitivity, specificity, and precision. ### Conclusions - Combining multiple machine - learning models can provide a more accurate T2DM risk - prediction tool. - The application of this model in the urban adult population in China has high feasibility and effectiveness and is helpful for the early prevention and management of T2DM. ### Future Work - It is necessary to further verify the applicability of the model in other populations, especially in Western populations. - Expand the sample size to confirm the scalability of the model. - Consider including more potentially important risk factors, such as diet / nutrition and family history. Through these efforts, this study provides strong support for the early prevention and diagnosis of T2DM and helps to reduce its burden on public health.