Feature Selection and Optimization of Random Forest Modeling

Min Zhu,Jing Xia,Mo Lei Yan,Sheng Yu Zhang,Guo Long Cai,Jing Yan,Gang Min Ning
DOI: https://doi.org/10.4028/www.scientific.net/amm.687-691.1416
2014-01-01
Applied Mechanics and Materials
Abstract:Traditional random forest algorithm is difficult to achieve very good effect for the classification of small sample data set. Because in the process of repeated random selection, selection sample is little, resulting in trees with very small degree of difference, which floods right decisions, makes bigger generalization error of the model, and the predict rate is reduced. For the sample size of sepsis cases data, this paper adopts for parameters used in random forest modeling interval division choice; divide feature interval into high correlation and uncertain correlation intervals; select data from two intervals respectively for modeling. Eventually reduce model generalization error, and improve accuracy of prediction.
What problem does this paper attempt to address?