Confidence intervals for the random forest generalization error

Paulo C. Marques F
DOI: https://doi.org/10.48550/arXiv.2112.06101
2022-03-11
Abstract:We show that the byproducts of the standard training process of a random forest yield not only the well known and almost computationally free out-of-bag point estimate of the model generalization error, but also give a direct path to compute confidence intervals for the generalization error which avoids processes of data splitting and model retraining. Besides the low computational cost involved in their construction, these confidence intervals are shown through simulations to have good coverage and appropriate shrinking rate of their width in terms of the training sample size.
Machine Learning
What problem does this paper attempt to address?