Confidence on the Focal: Conformal Prediction with Selection-Conditional Coverage

Ying Jin,Zhimei Ren
2024-03-24
Abstract:Conformal prediction builds marginally valid prediction intervals that cover the unknown outcome of a randomly drawn new test point with a prescribed probability. However, a common scenario in practice is that, after seeing the data, practitioners decide which test unit(s) to focus on in a data-driven manner and seek for uncertainty quantification of the focal unit(s). In such cases, marginally valid conformal prediction intervals may not provide valid coverage for the focal unit(s) due to selection bias. This paper presents a general framework for constructing a prediction set with finite-sample exact coverage conditional on the unit being selected by a given procedure. The general form of our method works for arbitrary selection rules that are invariant to the permutation of the calibration units, and generalizes Mondrian Conformal Prediction to multiple test units and non-equivariant classifiers. We then work out the computationally efficient implementation of our framework for a number of realistic selection rules, including top-K selection, optimization-based selection, selection based on conformal p-values, and selection based on properties of preliminary conformal prediction sets. The performance of our methods is demonstrated via applications in drug discovery and health risk prediction.
Machine Learning,Statistics Theory
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to provide effective prediction sets under selective conditions to ensure that the coverage probability of these prediction sets on selected units reaches the preset level. Specifically, when constructing prediction sets, traditional methods can usually only guarantee marginal validity in general, that is, the prediction sets cover the true results of newly randomly sampled test points with a certain probability. However, in practical applications, researchers or decision - makers are often only concerned with certain specific subsets in the data set, such as drug candidates with the highest predicted affinity, products showing strong demand, etc. In this case, if traditional prediction methods are still used, the coverage probability of the prediction set for the selected units may be lower than expected due to selection bias, thus misleading decision - making. To meet this challenge, the paper proposes a new framework - Joint Mondrian Conformal Inference (JOMI) for constructing prediction sets that can still maintain effective coverage after a given selection event. The core idea of JOMI is to find a "reference set", which is a data - dependent calibrated data subset. Even in the case of a given selection event, the data points in this subset are still exchangeable with new test points. By using this reference set, JOMI can provide calibrated uncertainty quantification for selected units, thereby achieving effective coverage under selection conditions. The main contributions of the paper include: - Proposing the JOMI framework, which can provide exact coverage for finite samples under selection conditions and is applicable to any selection rule. - Studying the computational aspects of JOMI and showing that when the set of target variables is finite, JOMI can be efficiently implemented. - Providing effective implementations for multiple practical selection rules, including covariate - based selection, selection based on preliminary conformal prediction sets, etc. - Verifying the effectiveness of the JOMI method through application cases in drug discovery and health risk prediction and other fields. In summary, this paper aims to solve the limitations of existing prediction methods when facing selective attention and proposes a new framework to ensure prediction accuracy under specific selection conditions.