Predicting the Risk of Stroke Based on Imbalanced Data Set with Missing Data

Haoyu Xie,Xuwei Fan,Yidan Zhang,Yihong Zhan,Weijian Xu,Lianfen Huang
DOI: https://doi.org/10.1109/icetci55101.2022.9832169
2022-01-01
Abstract:How to prevent stroke in daily life and grasp the possibility of illness as early as possible have become important problems. In this paper, we predict the possibility of the disease through the physiological indicators of the at-risk population. In this process, we first dealt with the common problems of missing data and imbalance between classes in medical data sets, and then we combined multiple simple neural networks and XGBoost tree model through ensemble learning methods. After training this strong classifier with the processed data set, we used this model to predict the incidence of stroke and compared it with the results of other classifiers. After several experiments, we got an average accuracy rate of 75.77 % and an average specificity of 78.8%, which means that our model can better identify high-risk individuals while ensuring a certain overall accuracy rate. On the whole, our work can quickly derive disease risk through easily accessible physiological indicators, and provide references for early diagnosis and treatment of stroke.
What problem does this paper attempt to address?