Towards Reliable Diabetes Prediction: Innovations in Data Engineering and Machine Learning Applications

Md. Alamin Talukder,Md Manowarul Islam,Md Ashraf Uddin,Mohsin Kazi,Majdi Khalid,Arnisha Akhter,Mohammad Ali Moni
DOI: https://doi.org/10.1101/2024.07.14.603436
2024-07-17
Abstract:Objective: Diabetes is a metabolic disorder that causes the risk of stroke, heart disease, kidney failure, and other long-term complications because diabetes generates excess sugar in the blood. Machine learning (ML) models can aid in diagnosing diabetes at the primary stage. So, we need an efficient machine learning model to diagnose diabetes accurately. Methods: In this paper, an effective data preprocessing pipeline has been implemented to process the data and random oversampling to balance the data, handling the imbalance distributions of the observational data more sophisticatedly. We used four different diabetes datasets to conduct our experiments. Several ML algorithms were used to determine the best models to predict diabetes faultlessly. Results: The performance analysis demonstrates that among all ML algorithms, RF surpasses the current works with an accuracy rate of 86% and 98.48% for dataset-1 and dataset-2; XGB and DT surpass with an accuracy rate of 99.27% and 100% for dataset-3 and dataset-4 respectively. Our proposal can increase accuracy by 12.15% compared to the model without preprocessing. Conclusions: This excellent research finding indicates that the proposed models might be employed to produce more accurate diabetes predictions to supplement current preventative interventions to reduce the incidence of diabetes and its associated costs.
Bioinformatics
What problem does this paper attempt to address?