A Risk Prediction Model for Type 2 Diabetes Based on Weighted Feature Selection of Random Forest and XGBoost Ensemble Classifier

Zhongxian Xu,Zhiliang Wang
DOI: https://doi.org/10.1109/ICACI.2019.8778622
2019-01-01
Abstract:Type 2 diabetes mellitus is a severe chronic disease threatening human health and has a high incidence worldwide. People need to use effective prediction model to diagnose and prevent diabetes in time. At present, data mining technology has become an increasingly important technology with classification capability in the field of medical diagnosis. This paper proposes a risk prediction model for type 2 diabetes based on ensemble learning method. In the proposed model, the weighted feature selection algorithm based on random forest (RF-WFS) is used for optimal feature selection, and extreme gradient boosting (XGBoost) classifier. The effectiveness of the method was validated by comparing the various performance metrics and the results of different contrast experiments. Additionally, we get a better prediction accuracy using the method than using the other classification algorithms (C4.5, Naive Bayes, AdaBoost, Random Forest). The validation results at UCI Pima Indian diabetes dataset shows that the model has better accuracy and classification performance than other research results mentioned in the literature. As a result, it has been proven that the model would be effective for the diagnosis of diabetes at the initial stage.
What problem does this paper attempt to address?