Diabetes prediction model based on data enhancement and algorithm ensemble

Wei Wang,Gaochao Xu,Yang Li,Zhenjun Jin,Long Li,Zhiqiang Dai
DOI: https://doi.org/10.1117/12.2631136
2022-03-18
Abstract:Diabetes is a chronic disease characterized by hyperglycemia. According to the statistics of estimated the number of people with diabetes reached 537 million in 2021. The rapid increase in the number of diabetics makes diabetes become a global health disease threatening mankind. Diabetes is usually accompanied by a long period of diagnosis. Before diabetes is discovered, the body is in a state of high blood sugar for a long time, which will cause serious complications for patients, early detection and treatment of diabetes can greatly alleviate the harm caused by the disease. In this context, this paper proposes a Diabetes prediction algorithm model based on PIMA Indians Diabetes Dataset(PID) published by the University of California at Irvine. In terms of model architecture, the prediction algorithm model is stored in the cloud server center to give full play to the efficient computing power of the cloud server, and it is more convenient to iteratively update the model algorithm. The algorithm model is divided into two parts. The first part uses the neural network to raise the dimension of the data. Inspired by the Cover theorem, the nonlinear dimension of the data is easier to distinguish the data. In the process of raising dimension, the loss function is used to make the data separability after raising dimension stronger. The second part uses the raised dimension data to train the base classification algorithm and integrate the trained algorithms with stacking methods to obtain the final model. Finally, the prediction accuracy of the algorithm model is 89.72% on the test set, which is significantly improved compared with other algorithms proposed on the PID data set.
What problem does this paper attempt to address?