Conformal Inference for Online Prediction with Arbitrary Distribution Shifts

Isaac Gibbs,Emmanuel Candès
2023-10-06
Abstract:We consider the problem of forming prediction sets in an online setting where the distribution generating the data is allowed to vary over time. Previous approaches to this problem suffer from over-weighting historical data and thus may fail to quickly react to the underlying dynamics. Here we correct this issue and develop a novel procedure with provably small regret over all local time intervals of a given width. We achieve this by modifying the adaptive conformal inference (ACI) algorithm of Gibbs and Candès (2021) to contain an additional step in which the step-size parameter of ACI's gradient descent update is tuned over time. Crucially, this means that unlike ACI, which requires knowledge of the rate of change of the data-generating mechanism, our new procedure is adaptive to both the size and type of the distribution shift. Our methods are highly flexible and can be used in combination with any baseline predictive algorithm that produces point estimates or estimated quantiles of the target without the need for distributional assumptions. We test our techniques on two real-world datasets aimed at predicting stock market volatility and COVID-19 case counts and find that they are robust and adaptive to real-world distribution shifts.
Methodology,Machine Learning
What problem does this paper attempt to address?
This paper aims to solve the problem of data distribution changing over time in online prediction. Specifically, when the distribution of generated data changes over time, how to construct effective prediction sets. Traditional methods perform poorly in the face of such distribution changes because they rely too much on historical data and cannot quickly adapt to the changes in the underlying dynamics. To solve this problem, this paper proposes a new process to adapt to any distribution change by improving the Adaptive Conformal Inference (ACI) algorithm and ensure a small regret value in all local time intervals. ### Main contributions of the paper 1. **Improvement of the ACI algorithm**: This paper introduces an additional step, that is, adjusting the step - size parameter in the ACI gradient - descent update over time. This enables the new method to not only adapt to the magnitude and type of distribution changes, but also does not need to know the change rate of the data - generation mechanism in advance. 2. **Flexibility and applicability**: The proposed method is very flexible and can be combined with any underlying prediction algorithm that produces target point estimates or estimated quantiles without making assumptions about the data distribution. 3. **Empirical verification**: The authors tested their techniques on two real - world datasets, namely predicting stock market fluctuations and the number of COVID - 19 cases, and the results show that these methods are robust and adaptable to real - world distribution changes. ### Core problem The core problem that the paper attempts to solve is: in the case of data distribution changing over time, how to effectively construct prediction sets to ensure the accuracy and reliability of prediction. Traditional methods are slow to react in the face of sudden changes because they rely too much on historical data, so a new method that can quickly adapt to changes is needed. ### Formula representation The key formulas involved in the paper include: - **Conformity Score**: \[ S((X_j, Y_j)_{1\leq j\leq n}, (X_{n + 1}, y)) := |y-\hat{\mu}(X_{n + 1})| \] where \(\hat{\mu}\) is a regression model fitted based on the training data. - **Prediction Set**: \[ \hat{C}_{n + 1} := \left\{y : S_y^{n + 1}\leq \text{Quantile}\left(1-\alpha,\frac{1}{n + 1}\sum_{i = 1}^{n + 1}\delta_{S_y^i}\right)\right\} \] - **ACI update rule**: \[ \alpha_{t + 1}=\alpha_t+\gamma(\alpha-\text{err}_t) \] where \(\text{err}_t\) represents the prediction error situation. Through these improvements, the paper provides a more flexible and highly adaptable online prediction method that can maintain high prediction accuracy when the data distribution changes.