Selective conformal inference with false coverage-statement rate control

Yajie Bao,Yuyang Huo,Haojie Ren,Changliang Zou
DOI: https://doi.org/10.1093/biomet/asae010
2023-01-02
Abstract:Conformal inference is a popular tool for constructing prediction intervals (PI). We consider here the scenario of post-selection/selective conformal inference, that is PIs are reported only for individuals selected from an unlabeled test data. To account for multiplicity, we develop a general split conformal framework to construct selective PIs with the false coverage-statement rate (FCR) control. We first investigate the Benjamini and Yekutieli (2005)'s FCR-adjusted method in the present setting, and show that it is able to achieve FCR control but yields uniformly inflated PIs. We then propose a novel solution to the problem, named as Selective COnditional conformal Predictions (SCOP), which entails performing selection procedures on both calibration set and test set and construct marginal conformal PIs on the selected sets by the aid of conditional empirical distribution obtained by the calibration set. Under a unified framework and exchangeable assumptions, we show that the SCOP can exactly control the FCR. More importantly, we provide non-asymptotic miscoverage bounds for a general class of selection procedures beyond exchangeablity and discuss the conditions under which the SCOP is able to control the FCR. As special cases, the SCOP with quantile-based selection or conformal p-values-based multiple testing procedures enjoys valid coverage guarantee under mild conditions. Numerical results confirm the effectiveness and robustness of SCOP in FCR control and show that it achieves more narrowed PIs over existing methods in many settings.
Methodology
What problem does this paper attempt to address?
This paper attempts to solve the problem of how to control the False Coverage - statement Rate (FCR) in selective prediction inference. Specifically, when selecting individuals from unlabeled test data and constructing prediction intervals for them, how to ensure that the FCR of these prediction intervals is controlled around a target level. The paper proposes a new method - Selective Conditional Conformal Prediction (SCOP), which constructs prediction intervals by using the conditional empirical distribution on the post - selection calibration set, thereby precisely controlling the FCR in a model - independent and distribution - independent manner. ### Background and Problem Description of the Paper - **Background**: Conformal inference is a widely - used tool for constructing prediction intervals. However, in post - selection or selective conformal inference, that is, when only reporting prediction intervals for individuals selected from unlabeled test data, the multiplicity problem needs to be considered. Without appropriate adjustment, the average coverage of the post - selection prediction intervals may be significantly lower than the nominal confidence level. - **Problem**: How to control the False Coverage - statement Rate (FCR) in selective prediction inference to ensure the reliability of prediction intervals. ### Main Contributions 1. **Research on FCR Adjustment Methods**: The paper first studies the application of the FCR adjustment method proposed by Benjamini and Yekutieli (2005) in conformal inference and proves that if the selection rule does not depend on the calibration set, this method can control the FCR. 2. **Selective Conditional Conformal Prediction (SCOP)**: The paper proposes a new method - Selective Conditional Conformal Prediction (SCOP), which constructs prediction intervals by using the conditional empirical distribution on the post - selection calibration set. In the case where the selection rule is exchangeable, this method can precisely control the FCR. 3. **FCR Control under Non - exchangeable Selection Rules**: For non - exchangeable selection rules that depend on the calibration set, the paper provides non - asymptotic FCR control bounds. 4. **Verification by Numerical Experiments**: Numerical results show that the proposed method is more accurate in controlling the FCR than existing methods and can provide narrower prediction intervals. ### Method Overview - **Selective Conditional Conformal Prediction (SCOP)**: - **Steps**: 1. **Data Segmentation and Training**: Split the labeled data set \(D_l\) into a training set \(D_t\) and a calibration set \(D_c\), and fit the prediction model \(\hat{\mu}(X)\) and the scoring function \(g(X)\) on the training set. 2. **Selection**: Calculate the scores \(T_i = g(X_i)\) and apply the selection rule \(S\) to determine the selection threshold \(\hat{\tau}\), obtaining the post - selection test subset \(\hat{S}_u\) and calibration subset \(\hat{S}_c\). 3. **Calibration**: Calculate the residuals \(R_i=|Y_i - \hat{\mu}(X_i)|\) of the calibration subset \(\hat{S}_c\). 4. **Construct Prediction Intervals**: For each selected test point \(j\in\hat{S}_u\), construct the prediction interval \(PISCOP_j=\hat{\mu}(X_j)\pm Q_\alpha(\{R_i\}_{i\in\hat{S}_c})\). ### Mathematical Formulas - **Selective Conditional Conformal Prediction Interval**: \[ PISCOP_j = \hat{\mu}(X_j)\pm Q_\alpha(\{R_i\}_{i\in\hat{S}_c}) \] where \(Q_\alpha(\{R_i\}_{i\in\hat{S}_c})\) represents the \(\lceil(1 - \alpha)(|\hat{S}_c|)\rceil\) - th order statistic of \(\{R_i\}_{i\in\hat{S}_c}\).