Regression Trees for Fast and Adaptive Prediction Intervals

Luben M. C. Cabezas,Mateus P. Otto,Rafael Izbicki,Rafael B. Stern
2024-02-13
Abstract:Predictive models make mistakes. Hence, there is a need to quantify the uncertainty associated with their predictions. Conformal inference has emerged as a powerful tool to create statistically valid prediction regions around point predictions, but its naive application to regression problems yields non-adaptive regions. New conformal scores, often relying upon quantile regressors or conditional density estimators, aim to address this limitation. Although they are useful for creating prediction bands, these scores are detached from the original goal of quantifying the uncertainty around an arbitrary predictive model. This paper presents a new, model-agnostic family of methods to calibrate prediction intervals for regression problems with local coverage guarantees. Our approach is based on pursuing the coarsest partition of the feature space that approximates conditional coverage. We create this partition by training regression trees and Random Forests on conformity scores. Our proposal is versatile, as it applies to various conformity scores and prediction settings and demonstrates superior scalability and performance compared to established baselines in simulated and real-world datasets. We provide a Python package clover that implements our methods using the standard scikit-learn interface.
Machine Learning
What problem does this paper attempt to address?
### The problems the paper attempts to solve This paper aims to solve the problem of uncertainty quantification in regression prediction. Specifically, the author proposes a new method to construct prediction intervals with local coverage guarantees. Traditional methods, when constructing prediction intervals, can often only guarantee marginal coverage and are not well - adapted to the local structure of the data, resulting in prediction intervals in some sub - populations that may not be accurate enough or fail to cover the true values. This is a serious problem in practical applications as it may lead to unfair decisions. ### Specific problem descriptions 1. **Limitations of prediction intervals**: - Traditional prediction interval methods usually need to make strong assumptions about the data - generation process to ensure correct coverage. - Although conformal inference methods can provide distribution - free marginal coverage, the prediction intervals they generate are often non - adaptive, that is, the coverage may be inconsistent at different positions in the feature - space. 2. **The need for local coverage**: - Ideally, prediction intervals should not only have global marginal coverage but also local coverage in different regions of the feature - space. - This means that for specific sub - populations, prediction intervals should be able to accurately cover the true values, rather than performing well overall but poorly in some sub - populations. 3. **Deficiencies of existing methods**: - Most existing conformal inference methods, although they can achieve conditional coverage in the asymptotic case, these methods usually rely on new conformal scores that are independent of the estimated regression function, and thus cannot construct prediction intervals around the estimated regression function. - Non - conformal methods can provide local adaptability, but often cannot provide effective prediction intervals in the finite - sample setting. ### Solutions The author proposes new methods based on regression trees and random forests, called Locart and Loforest, to solve the above problems. The main features of these methods are as follows: 1. **Locart**: - Partition the feature - space by training regression trees and apply conformal inference within each partition to estimate the truncation values. - This method ensures that the prediction intervals have good local coverage in different regions of the feature - space. 2. **Loforest**: - Define prediction intervals based on a random forest of multiple regression trees. - Although Loforest is not a conformal method, it exhibits excellent conditional coverage in practice. 3. **Enhanced versions**: - Propose A - Locart and A - Loforest, which further refine the partitions by expanding the feature - space. - Introduce W - Loforest, which is a weighted version of Loforest and can be used to improve local weighted prediction intervals. ### Summary This paper solves the deficiencies of traditional prediction interval methods in terms of local coverage by proposing the Locart and Loforest methods, providing more accurate and highly adaptable prediction intervals, thereby improving the reliability and fairness of prediction in practical applications.