An Improved Machine Learning Prediction Model for Diabetes

Aayushi Bansal,Anita Singhrova
DOI: https://doi.org/10.1007/978-981-16-3690-5_13
2021-11-09
Abstract:Diabetes is a chronic disease which if not treated on time may cause serious complications. Machine learning has been very useful in the diagnosis of diabetes thus saving lives of many. The PIMA diabetes dataset used in the research work consisted of many missing and outlier values which need to be treated. According to the medical practitioners the values of insulin, skin thickness, BMI can never be zero and both diabetic and non diabetic will have different range of values for various parameters therefore a technique to replace the missing and outlier values with respective medians is proposed. After preprocessing the dataset, PCA algorithm is applied for dimensionality reduction and thereafter two similar clusters are formed using the kmeans clustering algorithms. The predicted output of the various classifiers is then given as input to the voting classifier which uses the soft voting technique to make the final prediction. The accuracy of the proposed technique comes out to be 98.70% and also an improvement in precision, recall and f1 score has been noted. The mean squared error of the proposed algorithm is very less as compared to the previous classifiers.
What problem does this paper attempt to address?