Abstract:Predictive models make mistakes. Hence, there is a need to quantify the uncertainty associated with their predictions. Conformal inference has emerged as a powerful tool to create statistically valid prediction regions around point predictions, but its naive application to regression problems yields non-adaptive regions. New conformal scores, often relying upon quantile regressors or conditional density estimators, aim to address this limitation. Although they are useful for creating prediction bands, these scores are detached from the original goal of quantifying the uncertainty around an arbitrary predictive model. This paper presents a new, model-agnostic family of methods to calibrate prediction intervals for regression problems with local coverage guarantees. Our approach is based on pursuing the coarsest partition of the feature space that approximates conditional coverage. We create this partition by training regression trees and Random Forests on conformity scores. Our proposal is versatile, as it applies to various conformity scores and prediction settings and demonstrates superior scalability and performance compared to established baselines in simulated and real-world datasets. We provide a Python package clover that implements our methods using the standard scikit-learn interface.

What problem does this paper attempt to address?

### The problems the paper attempts to solve This paper aims to solve the problem of uncertainty quantification in regression prediction. Specifically, the author proposes a new method to construct prediction intervals with local coverage guarantees. Traditional methods, when constructing prediction intervals, can often only guarantee marginal coverage and are not well - adapted to the local structure of the data, resulting in prediction intervals in some sub - populations that may not be accurate enough or fail to cover the true values. This is a serious problem in practical applications as it may lead to unfair decisions. ### Specific problem descriptions 1. **Limitations of prediction intervals**: - Traditional prediction interval methods usually need to make strong assumptions about the data - generation process to ensure correct coverage. - Although conformal inference methods can provide distribution - free marginal coverage, the prediction intervals they generate are often non - adaptive, that is, the coverage may be inconsistent at different positions in the feature - space. 2. **The need for local coverage**: - Ideally, prediction intervals should not only have global marginal coverage but also local coverage in different regions of the feature - space. - This means that for specific sub - populations, prediction intervals should be able to accurately cover the true values, rather than performing well overall but poorly in some sub - populations. 3. **Deficiencies of existing methods**: - Most existing conformal inference methods, although they can achieve conditional coverage in the asymptotic case, these methods usually rely on new conformal scores that are independent of the estimated regression function, and thus cannot construct prediction intervals around the estimated regression function. - Non - conformal methods can provide local adaptability, but often cannot provide effective prediction intervals in the finite - sample setting. ### Solutions The author proposes new methods based on regression trees and random forests, called Locart and Loforest, to solve the above problems. The main features of these methods are as follows: 1. **Locart**: - Partition the feature - space by training regression trees and apply conformal inference within each partition to estimate the truncation values. - This method ensures that the prediction intervals have good local coverage in different regions of the feature - space. 2. **Loforest**: - Define prediction intervals based on a random forest of multiple regression trees. - Although Loforest is not a conformal method, it exhibits excellent conditional coverage in practice. 3. **Enhanced versions**: - Propose A - Locart and A - Loforest, which further refine the partitions by expanding the feature - space. - Introduce W - Loforest, which is a weighted version of Loforest and can be used to improve local weighted prediction intervals. ### Summary This paper solves the deficiencies of traditional prediction interval methods in terms of local coverage by proposing the Locart and Loforest methods, providing more accurate and highly adaptable prediction intervals, thereby improving the reliability and fairness of prediction in practical applications.

Regression Trees for Fast and Adaptive Prediction Intervals

Regression Trees for Fast and Adaptive Prediction Intervals

Valid prediction intervals for regression problems

Adaptive Conformal Regression with Jackknife+ Rescaled Scores

Regression Conformal Prediction under Bias

Adaptive Conformal Prediction by Reweighting Nonconformity Score

Conformalized Selective Regression

Building Conformal Prediction Intervals with Approximate Message Passing

Learning Prediction Intervals for Regression: Generalization and Calibration

Using random forest for reliable classification and cost-sensitive learning for medical diagnosis

Conformal Prediction using Conditional Histograms

Conformal Prediction Intervals with Temporal Dependence

Conformal Thresholded Intervals for Efficient Regression

Adaptive Conformal Prediction Intervals Using Data-Dependent Weights With Application to Seismic Response Prediction

Boosted Conformal Prediction Intervals

Distribution-Free Prediction Intervals Under Covariate Shift, With an Application to Causal Inference

Conformal prediction with local weights: randomization enables local guarantees

Adjusting Regression Models for Conditional Uncertainty Calibration

A Conformal Prediction Approach to Explore Functional Data

Self-Calibrating Conformal Prediction