Rule Mining of Early Diabetes Symptom and Applied Supervised Machine Learning and Cross Validation Approaches based on the Most Important Features to Predict Early-stage Diabetes
Mahade Hasan -,Farhana Yasmin -,Linhong Deng -
DOI: https://doi.org/10.37082/ijirmps.v11.i3.230225
2023-06-28
International Journal of Innovative Research in Engineering & Multidisciplinary Physical Sciences
Abstract:Diabetes is one of several illnesses referred to be "chronic". It is the most prevalent disease that significantly impacts the population. Although there are numerous possible causes of diabetes, age, excessive body fat, frailty, fast weight loss, and many other conditions are the ones that occur most often. Diabetes patients are more susceptible to developing a number of diseases, including heart disease, kidney issues, damaged nerves, damaged blood vessels, and blindness. It is challenging to diagnose the ailment, and it is both costly and difficult to anticipate how it will develop. Machine learning (ML) offers tremendous potential to develop useful applications for earlier detection, diagnosis, and therapy, as well as the treatment of many disorders, which is why medical experts are particularly interested in it. This study aims to develop a model that can reliably and precisely identify diabetes. Following that, association rule mining was employed to find the common indications of diabetic symptoms. This study also presents a useful model for diabetes prediction that makes use of various machine-learning approaches to improve diabetes categorization and increase the precision of diabetes prediction. Machine learning methods utilized in the early stage diabetes prediction include Gaussian Naive Bayes, ExtraTreesClassifier, Decision Trees, K-Nearest Neighbors, Random Forest Classifier, Support Vector Machine, and Logistic Regression. The choice of the dataset's major attribute was then made after considering a total of six different ways. Then, a total of 10 alternative models utilized for early-stage diabetes prediction were applied to the previously selected and highlighted dataset. Accuracy, precision, recall, and F-measure are some of the metrics used to evaluate the various performance levels of these models. The performance matrices show that the ExtraTreesClassifier performed at the maximum level possible, earning a perfect score in each area with accuracy, recall, precision, and F1 score of 100%. Therefore, we can assert that the performance of our ExtraTreesClassifier model is superior to that of the already available work. Clinical doctors who read this article will gain new knowledge and be better able to identify early diabetes.