Bayesian post-hoc regularization of random forests

Bastian Pfeifer
2023-06-06
Abstract:Random Forests are powerful ensemble learning algorithms widely used in various machine learning tasks. However, they have a tendency to overfit noisy or irrelevant features, which can result in decreased generalization performance. Post-hoc regularization techniques aim to mitigate this issue by modifying the structure of the learned ensemble after its training. Here, we propose Bayesian post-hoc regularization to leverage the reliable patterns captured by leaf nodes closer to the root, while potentially reducing the impact of more specific and potentially noisy leaf nodes deeper in the tree. This approach allows for a form of pruning that does not alter the general structure of the trees but rather adjusts the influence of leaf nodes based on their proximity to the root node. We have evaluated the performance of our method on various machine learning data sets. Our approach demonstrates competitive performance with the state-of-the-art methods and, in certain cases, surpasses them in terms of predictive accuracy and generalization.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The main objective of this paper is to address the issue of overfitting in random forest models when dealing with noise or irrelevant features. Specifically, the paper proposes a Bayesian post-hoc regularization method, which aims to improve the model's generalization performance by adjusting the probabilities of the decision tree leaf nodes. The core idea of this method is to iteratively update the conjugate Beta prior distribution from the root node to the leaf nodes, thereby assigning greater weight to the leaf nodes closer to the root. In this way, the influence of overly specific and potentially noisy leaf nodes can be reduced, thereby enhancing the overall performance of the model. The paper evaluates the proposed method on 4 machine learning benchmark datasets and compares it with the existing Hierarchical Shrinkage method. Experimental results show that the Bayesian post-hoc regularization method outperforms the Hierarchical Shrinkage method in terms of Balanced Accuracy and, in some cases, even excels in the ROC-AUC metric. Moreover, the method significantly improves the accuracy of the baseline random forest model in all tests.