Arranged Forests: Enhancing Random Forests by Reducing Feature Overlap between Trees

Chu Luo,Yuehui Zhang
2020-01-01
Abstract:This paper proposes Arranged Forests which aims to improve Random Forests by reducing the feature overlap between decision trees. The mathematics involved in our work is an ”opposite” problem of the extremal set theory. We establish a dual version of the famous Erdős-Ko-Rado theorem to settle the corresponding problem. To quantify the feature overlap in a forest, we introduce two measures: pairwise and total repetition index. For trees in Arranged Forests, we design two feature distribution algorithms to construct feature sets with the lowest total repetition index and low pairwise repetition index. Based on mathematical analysis and empirical results, we show that Arranged Forests with certain parameters can achieve much lower repetition than Random Forests. Also, empirical results show that Arranged Forests can achieve the average performance of Random Forests and significantly outperform the bad models of Random Forests.
What problem does this paper attempt to address?