Integrated Learning Model Based on GC-Stacking for Early Prediction of Diabetes Mellitus

Xiaoxia Li,Jianjun Zhang,Peishun Liu,Ruichun Tang,Qing Guo,Qinshuo Wang
DOI: https://doi.org/10.1109/pic53636.2021.9687044
2021-01-01
Abstract:Diabetes mellitus (DM) prediction facilitates timely targeted treatment and interventions in the early stages of DM, and is important for reducing the incidence of DM and analyzing risk factors. In this paper, we proposed an integrated learning model GC-Stacking based on Genetic Algorithm (GA) and improved CatBoost method. Firstly, we selected the most optimal set of traits associated with diabetes risk factors based on the global search capability of genetic algorithm (GA); Then, the improved CatBoost method is combined with KNN, SVM and other algorithms with excellent prediction performance as the main learner, and then, the stack ensemble learning strategy is adopted. RF is used as a secondary learner to train this integrated prediction model, which uses the selected features for diabetes prediction. The model was validated on the Qingdao CDC physical examination dataset and the UCI public diabetes dataset. The experimental results showed that the GC-stacking model based on 7-fold cross validation has better predictive performance. It outperforms other algorithms in terms of accuracy, Fl-score and other performance metrics.
What problem does this paper attempt to address?