Abstract:Early screening for diabetes can promptly identify potential early stage patients, possibly delaying complications and reducing mortality rates. This paper presents a novel technique for early diabetes screening and prediction, called the Attention-Enhanced Deep Neural Network (AEDNN). The proposed AEDNN model incorporates an Attention-based Feature Weighting Layer combined with deep neural network layers to achieve precise diabetes prediction. In this study, we utilized the Diabetes-NHANES dataset and the Pima Indians Diabetes dataset. To handle significant missing values and outliers, group median imputation was applied. Oversampling techniques were used to balance the diabetes and non-diabetes groups. The data were processed through an Attention-based Feature Weighting Layer for feature extraction, producing a feature matrix. This matrix was subjected to Hadamard product operations with the raw data to obtain weighted data, which were subsequently input into deep neural network layers for training. The parameters were fine-tuned and the L2 regularization and dropout layers were added to enhance the generalization performance of the model. The model's reliability was thoroughly assessed through various metrics, including the accuracy, precision, recall, F1 score, mean squared error (MSE), and R2 score, as well as the ROC and AUC curves. The proposed model achieved a prediction accuracy of 98.4% in the Pima Indians Diabetes dataset. When the test dataset was expanded to the large-scale Diabetes-NHANES dataset, which contains 52,390 samples, the test precision of the model improved further to 99.82%, with an AUC of 0.9995. A comparative analysis was conducted using multiple models, including logistic regression with L1 regularization, support vector machine (SVM), random forest, K-nearest neighbors (KNNs), AdaBoost, XGBoost, and the latest semi-supervised XGBoost. The feature extraction method using attention mechanisms was compared with the classical feature selection methods, Lasso and Ridge. The experiments were performed on the same dataset, and the conclusion was that the Attention-based Ensemble Deep Neural Network (AEDNN) outperformed all the aforementioned methods. These results indicate that the model not only performs well on smaller datasets but also fully leverages its advantages on larger datasets, demonstrating strong generalization ability and robustness. The proposed model can effectively assist clinicians in the early screening of diabetes patients. This is particularly beneficial for the preliminary screening of high-risk individuals in large-scale, extensive healthcare datasets, followed by detailed examination and diagnosis. Compared to the existing methods, our AEDNN model showed an overall performance improvement of 1.75%.

Ensemble Learning Models Based on Noninvasive Features for Type 2 Diabetes Screening: Model Development and Validation

Supervised Machine Learning based Ensemble Model for Accurate Prediction of Type 2 Diabetes

Prediction of Type 2 Diabetes Based on Machine Learning Algorithm

Machine Learning as a Support for the Diagnosis of Type 2 Diabetes

Development and validation of a machine learning-augmented algorithm for diabetes screening in community and primary care settings: A population-based study

Predicting the Risk of Incident Type 2 Diabetes Mellitus in Chinese Elderly Using Machine Learning Techniques

Identifying diagnostic indicators for type 2 diabetes mellitus from physical examination using interpretable machine learning approach

Predicting the Development of Type 2 Diabetes in a Large Australian Cohort Using Machine-Learning Techniques: Longitudinal Survey Study

Nonlaboratory-Based Risk Assessment Model for Type 2 Diabetes Mellitus Screening in Chinese Rural Population: A Joint Bagging-Boosting Model.

Type 2 Diabetes Mellitus Prediction Model Based on Data Mining

An Ensemble Model for Diabetes Diagnosis in Large-scale and Imbalanced Dataset.

Development and economic assessment of machine learning models to predict glycosylated hemoglobin in type 2 diabetes

Machine Learning Models in Type 2 Diabetes Risk Prediction: Results from a Cross-sectional Retrospective Study in Chinese Adults

A Risk Prediction Model for Type 2 Diabetes Based on Weighted Feature Selection of Random Forest and XGBoost Ensemble Classifier

Early Stage Diabetes Identification Using Machine Learning Ensemble Techniques

A risk prediction model for type 2 diabetes mellitus complicated with retinopathy based on machine learning and its application in health management

An enhanced machine learning algorithm for type 2 diabetes prognosis with a detailed examination of Key correlates

1233-P: Prediction of Type 2 Diabetes Occurrence Using Machine Learning Model

Artificial Intelligence-based Prediction of Diabetes and Prediabetes Using Health Checkup Data in Korea

Utilizing Attention-Enhanced Deep Neural Networks for Large-Scale Preliminary Diabetes Screening in Population Health Data

Machine Learning-Based Predictive Modeling of Diabetic Nephropathy in Type 2 Diabetes Using Integrated Biomarkers: A Single-Center Retrospective Study