Predicting Survivability of Colorectal Cancer by an Ensemble Classification Method Improved on Random Forest

Yuyan WANG,Dujuan WANG,Yanzhang WANG,Jin Yaochu
DOI: https://doi.org/10.3969/j.issn.1672-0334.2017.01.009
2017-01-01
Journal of Management Science
Abstract:Cancer is one of the major causes of death for human and accounts for a large proportion of the costs of healthcare in many countries.The prediction of cancer survivability is an important task for cancer prognosis and has been a challenging research problem for many researchers,which can help doctors to make more accurate diagnostic and treatment decisions and lower treatment costs.In recent years,data-driven methods for cancer survivability prediction have been gradually put into application,yet improving the accuracy of cancer survivability prediction methods has always been an active area of research as the accuracy of prediction is the main index to evaluate the performance of prediction methods.This paper focuses on colorectal cancer which has both high incidence and high mortality.In order to make survivability prediction of colorectal cancer more accuracy,an ensemble classification method based on GA-RF is proposed.This method is the outcome of using genetic algorithm(GA for short) to make improvements to the random forest(RF for short).Genetic algorithm is used to search for parts of the decision trees in random forest aiming at getting better accuracy of ensemble classification.The method proposed along with decision tree method and the random forest method after parameter optimization are used to develop prediction models to predict the survivability of patients with colorectal cancer.Using the colorectal cancer data set of the SEER database,experiments are carried out with three methods which are tested by 10-fold cross-validation for performance comparison purposes,and then accuracy,sensitivity and specificity are used to evaluate the three methods.The experimental results indicated that the ensemble classification method based on GA-RF had the prediction accuracy of 88.2%,higher than that of the random forest after parameter optimization and decision tree.And random forest which came out to be the second also had a high accuracy of 86.4%,but the complexity of ensemble was much more than that of the ensemble classification method based on GA-RF,and decision tree came out to be the worst of the three with 74.2% accuracy.Besides,the ensemble classification method based on GA-RF showed the best generalization ability.The ensemble classification method proposed makes an effective improvement on random forest,which can predict survivability of colorectal cancer with higher efficiency and accuracy,provide reference for decision-making of colorectal cancer prognosis,make up for the shortage of survivability prediction based on experience,and has practical significance to saving medical resources,reducing medical costs and improving patient satisfaction.
What problem does this paper attempt to address?