Three-way Selection Random Forest Algorithm Based on Decision Boundary Entropy

Zhang Chunying,Ren Jing,Liu Fengchun,Li Xiaoqi,Liu Shouyue
DOI: https://doi.org/10.1007/s10489-021-03033-7
IF: 5.3
2022-01-01
Applied Intelligence
Abstract:Aiming at the problem of high probability of negative impact about redundant attributes in random forest algorithms, a Three-way Selection Random Forest algorithm based on decision boundary entropy (TSRF) is proposed without losing randomness and reducing the influence of redundant attributes on decision-making results. According to the characteristics of the attribute, the concept of decision boundary entropy is defined. Then a measuring method of attribute importance based on decision boundary entropy is proposed and set as an evaluation standard. Three-way decision is constructed and the attribute is divided into three candidate domains, namely positive domain, negative domain and boundary domain. In order to ensure the randomness of attributes, three-way attribute random selection rules based on attribute randomness are established and a certain number of attributes are randomly selected from the three candidate domains. Combine the samples selected by the bootstrap sampling method with attribute sets selected by three-way decision to produce training sample sets so that we can train the decision trees and generate forest. Six datasets are selected for the experiment. Two parameters of attribute randomness and three-way decision thresholds are analyzed to verify the theoretical conclusions respectively. The results show that the TSRF algorithm can meet the different requirements of different data sets by adjusting the parameters. The classification effect on the binary data is basically the same as the comparison algorithm, but TSRF has a significant improvement effect on the multi-class data compared with other algorithms. The proposed TSRF algorithm widens the idea for the measurement method of significance of attribute, innovates the random forest three-way selection integration method, and provides a better model framework for solving multi-classification problems.
What problem does this paper attempt to address?