A two-stage prediction filling method with support vector technologies optimized competitively in stages by grey wolf optimizer and particle swarm optimization for missing fasting blood glucose

Wenlong Gao,Jingxiang Xie,Yongsong Ke,Maoyun Tian,Zhimei Zeng,Xiaojie Ma,Minqian Zhi
DOI: https://doi.org/10.1177/09544119231206456
2023-10-25
Proceedings of the Institution of Mechanical Engineers Part H Journal of Engineering in Medicine
Abstract:Proceedings of the Institution of Mechanical Engineers, Part H: Journal of Engineering in Medicine, Ahead of Print. Missing values often affect the data utilization in epidemiological survey. In this study, according to the cut-off point value of the medical diagnostic standard of fasting blood glucose for diabetes, we divide fasting blood glucose test data from the China Health and Nutrition Survey (CHNS) of Shandong province in 2009 into two classes: the normal and the abnormal. Accordingly, for missing fasting blood glucose values, we propose a two-stage prediction filling method with optimized support vector technologies competitively by particle swarm optimization (PSO) or grey wolf optimizer (GWO), which is to first predict the class of the missing data with support vector machine (SVM) in the first stage and then predict the missing value with support vector regression (SVR) within the predicted class in the second stage. In addition, we use the LIBSVM as a gold standard to train both SVM and SVR in different stages. For two kinds of competitive optimizers in stages, in the first stage GWO has the highest classification accuracy (91.1%), and in the second stage PSO has the smallest in-class mean absolute error (0.48). So, GWO-SVM-PSO-SVR is determined as the optimal model and a predicted value with it serves as a fill value. The comparison results of the models in empirical analysis also show that it outdoes any of the other filling models in terms of mean absolute error and mean absolute percentage error. In addition, the sensitivity analysis shows that it presents high tolerance as the sample size changes and has a good stability.
engineering, biomedical
What problem does this paper attempt to address?