Cardiovascular Risk Prediction Method Based On Cfs Subset Evaluation And Random Forest Classification Framework

Shan Xu,Zhen Zhang,Daoxian Wang,Junfeng Hu,Xiaohui Duan,Tiangang Zhu
DOI: https://doi.org/10.1109/ICBDA.2017.8078813
2017-01-01
Abstract:Cardiovascular Disease (CVD) is a highly significant contributor to loss of quality and quantity of life all over the world. Early detection and risk prediction is very important for patients' treatment and doctors' diagnose. This paper focus on establishing a more accurate and practical risk prediction system based on data mining techniques to provide auxiliary medical service. In order to be practically used for collecting and analyzing patients' data in healthcare industries, the system consists of four parts: data interface, data preparation, feature selection and classification. Data interface response to obtain hospitals' raw data from hospital; data preprocessing is needed for data integration, data cleaning and rating mapping etc. Key features were then selected by CFS Subset Evaluation combined with Best-First-Search method to reduce dimensionality. Random forest was inducted as basic classifier to identify risk level, which is a prior trial in CVD risk prediction field. Cleveland Heart-Disease Database (CHDD) and Cardiology inpatient dataset of PKU People's Hospital were both tested to confirm accuracy as well as practicality. In CHDD test, our system has a significantly higher accuracy of 91.6% than other methods. In People's Hospital dataset test, it achieves an accuracy of 97%, which is better than most of other classifiers except SVM (98.9%), however random forest only take half of time than SVM. Comprehensively considering the risk prediction system shows great significance in accuracy and practical use for patients' treatment and doctors' diagnose.
What problem does this paper attempt to address?