Abstract:Globally, diabetes affects 537 million people, making it the deadliest and the most common non-communicable disease. Many factors can cause a person to get affected by diabetes, like excessive body weight, abnormal cholesterol level, family history, physical inactivity, bad food habit etc. Increased urination is one of the most common symptoms of this disease. People with diabetes for a long time can get several complications like heart disorder, kidney disease, nerve damage, diabetic retinopathy etc. But its risk can be reduced if it is predicted early. In this paper, an automatic diabetes prediction system has been developed using a private dataset of female patients in Bangladesh and various machine learning techniques. The authors used the Pima Indian diabetes dataset and collected additional samples from 203 individuals from a local textile factory in Bangladesh. Feature selection algorithm mutual information has been applied in this work. A semi-supervised model with extreme gradient boosting has been utilized to predict the insulin features of the private dataset. SMOTE and ADASYN approaches have been employed to manage the class imbalance problem. The authors used machine learning classification methods, that is, decision tree, SVM, Random Forest, Logistic Regression, KNN, and various ensemble techniques, to determine which algorithm produces the best prediction results. After training on and testing all the classification models, the proposed system provided the best result in the XGBoost classifier with the ADASYN approach with 81% accuracy, 0.81 F1 coefficient and AUC of 0.84. Furthermore, the domain adaptation method has been implemented to demonstrate the versatility of the proposed system. The explainable AI approach with LIME and SHAP frameworks is implemented to understand how the model predicts the final results. Finally, a website framework and an Android smartphone application have been developed to input various features and predict diabetes instantaneously. The private dataset of female Bangladeshi patients and programming codes are available at the following link: https://github.com/tansin-nabil/Diabetes-Prediction-Using-Machine-Learning.

Diabetes prediction model based on data enhancement and algorithm ensemble

Diabetes prediction model based on an enhanced deep neural network

An Improved Machine Learning Prediction Model for Diabetes

Diabetes prediction model based on GA-XGBoost and stacking ensemble algorithm

PREDICTIVE ANALYSIS OF DIABETES WITHOUT DATA PRE-PROCESSING VIA THE EVALUATION OF TREE ALGORITHMS

The role of artificial intelligence in disease prediction: using ensemble model to predict disease mellitus

Diabetes Mellitus Prediction using Machine Learning

Diabetes prediction using machine learning and explainable AI techniques

Hybrid stacked ensemble combined with genetic algorithms for Prediction of Diabetes

Prediction of Diabetes using Machine Learning

Supervised Machine Learning based Ensemble Model for Accurate Prediction of Type 2 Diabetes

An Ensemble Approach for the Prediction of Diabetes Mellitus Using a Soft Voting Classifier with an Explainable AI

A Data-experience intelligent model to integrate human judging behavior and statistics for predicting diabetes complications

Prediction of Diabetes Progress Based on Machine Learning Approach

Improving Healthcare Prediction of Diabetic Patients Using KNN Imputed Features and Tri-Ensemble Model

Machine learning-based prediction of diabetic patients using blood routine data

An intelligent diabetes classification and perception framework based on ensemble and deep learning method

A novel ensemble machine learning framework for early stage diabetes mellitus prediction

Diabetic Retinopathy Prediction by Ensemble Learning Based on Biochemical and Physical Data

An effective correlation-based data modeling framework for automatic diabetes prediction using machine and deep learning techniques

PREDICTIVE MODELLING FOR DIABETES USING MACHINE LEARNING