Building Conformal Prediction Intervals with Approximate Message Passing

Lucas Clarté,Lenka Zdeborová
2024-10-22
Abstract:Conformal prediction has emerged as a powerful tool for building prediction intervals that are valid in a distribution-free way. However, its evaluation may be computationally costly, especially in the high-dimensional setting where the dimensionality and sample sizes are both large and of comparable magnitudes. To address this challenge in the context of generalized linear regression, we propose a novel algorithm based on Approximate Message Passing (AMP) to accelerate the computation of prediction intervals using full conformal prediction, by approximating the computation of conformity scores. Our work bridges a gap between modern uncertainty quantification techniques and tools for high-dimensional problems involving the AMP algorithm. We evaluate our method on both synthetic and real data, and show that it produces prediction intervals that are close to the baseline methods, while being orders of magnitude faster. Additionally, in the high-dimensional limit and under assumptions on the data distribution, the conformity scores computed by AMP converge to the one computed exactly, which allows theoretical study and benchmarking of conformal methods in high dimensions.
Machine Learning,Disordered Systems and Neural Networks
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to efficiently construct prediction intervals with distribution - free characteristics in high - dimensional data settings. Specifically: 1. **Computational complexity problem**: The traditional Full Conformal Prediction (FCP) method has a very high computational cost in high - dimensional data because it is necessary to fit models for each possible label and calculate leave - one - out residuals. This is especially obvious in high - dimensional data (when both the number of samples \(n\) and the feature dimension \(d\) are large and the ratio is close). 2. **Accelerating prediction interval calculation**: To meet this challenge, the author proposes a new method based on the Approximate Message Passing (AMP) algorithm to accelerate the calculation of full conformal prediction intervals in generalized linear regression. The AMP algorithm achieves this by approximately calculating the conformity score. 3. **Theoretical guarantee and practical performance**: The author not only shows a significant improvement in the calculation speed of this method but also proves that in the high - dimensional limit, the conformity score calculated using AMP converges to the exact leave - one - out score, thus providing a theoretical coverage guarantee. In addition, the experimental results show that this method can produce prediction intervals similar to the baseline method on both synthetic data and real data, but is several orders of magnitude faster. ### Specific problem description - **Objective**: To quickly and accurately construct prediction intervals with distribution - free characteristics in high - dimensional data. - **Challenge**: The computational complexity of traditional methods (such as FCP) in high - dimensional data is too high. - **Solution**: Introduce the AMP algorithm to approximately calculate leave - one - out residuals, thereby accelerating the calculation of prediction intervals and maintaining a theoretical coverage guarantee. ### Main contributions 1. **Applying AMP to accelerate FCP**: For the first time, AMP is applied to full conformal prediction in generalized linear regression, and the calculation is accelerated by simultaneously approximately calculating all leave - one - out estimators. 2. **Introducing Taylor - AMP**: Further accelerate the calculation by removing the need to fit models for each possible label and using Taylor expansion to approximately calculate the conformity score. 3. **Theoretical analysis**: In the high - dimensional limit, it is proved that the conformity score calculated by AMP converges to the exact leave - one - out score, thus allowing the study of conformal prediction in high - dimensional data and providing a benchmark for other methods. Through these contributions, the author has opened up a new direction, making high - dimensional statistical methods practically applicable to uncertainty quantification, especially in fields such as genomics and MRI reconstruction.