Multi-model Ensemble Conformal Prediction in Dynamic Environments

Erfan Hajihashemi,Yanning Shen
2024-11-06
Abstract:Conformal prediction is an uncertainty quantification method that constructs a prediction set for a previously unseen datum, ensuring the true label is included with a predetermined coverage probability. Adaptive conformal prediction has been developed to address data distribution shifts in dynamic environments. However, the efficiency of prediction sets varies depending on the learning model used. Employing a single fixed model may not consistently offer the best performance in dynamic environments with unknown data distribution shifts. To address this issue, we introduce a novel adaptive conformal prediction framework, where the model used for creating prediction sets is selected on the fly from multiple candidate models. The proposed algorithm is proven to achieve strongly adaptive regret over all intervals while maintaining valid coverage. Experiments on real and synthetic datasets corroborate that the proposed approach consistently yields more efficient prediction sets while maintaining valid coverage, outperforming alternative methods.
Machine Learning
What problem does this paper attempt to address?
This paper attempts to solve the problem of inefficient prediction sets in dynamic environments due to the change of data distribution over time. Specifically, traditional single - model adaptive conformal prediction methods may not be able to maintain good performance in unknown data distribution changes, especially in terms of the size and coverage probability of prediction sets. To meet this challenge, the authors propose a new multi - model integrated adaptive conformal prediction framework (Strongly Adaptive Multi - model Ensemble Online Conformal Prediction, SAMOCP), which can dynamically select the most appropriate model among multiple candidate models to create prediction sets, thereby improving the efficiency of prediction sets while maintaining effective coverage. ### Main contributions of the paper: 1. **Proposing a new algorithm**: Introduce the SAMOCP algorithm, which is designed for dynamic environments with unknown data distribution changes and can dynamically select models based on the performance of the previous moment. 2. **Strongly adaptive regret**: Prove that the SAMOCP algorithm can achieve strongly adaptive regret in any time interval while ensuring an effective coverage probability. 3. **Experimental verification**: Through experiments in classification tasks, it is shown that SAMOCP can achieve a coverage probability close to the target value while constructing more efficient prediction sets, which is superior to existing methods. ### Key technical details: - **Non - conformity score**: Define the non - conformity score \( S_m(X, Y) \) to evaluate the degree of inconsistency between model predictions and true labels. - **Weight update**: The weight \( w_m^t \) and mis - coverage probability \( \alpha_m^t \) of each model will be updated according to the observed true labels to adapt to the change of data distribution. - **Expert mechanism**: In the SAMOCP algorithm, each model is regarded as an "expert", and multiple experts are created at different time points and have a specific life cycle. This ensures that in a dynamic environment, only the most recent experts participate in decision - making, avoiding the influence of obsolete experts. - **Loss function**: Use the pinball loss function to adjust the mis - coverage probability to ensure the balance between the size and coverage probability of prediction sets. ### Experimental results: - **Coverage rate**: The coverage rate of SAMOCP on different datasets is close to the target value, indicating that it can effectively control the mis - coverage probability. - **Prediction set size**: The average size of the prediction sets constructed by SAMOCP is small, which improves the prediction efficiency. - **Adaptive regret**: In different time intervals, the adaptive regret value of SAMOCP is low, indicating that it has strong adaptability in dynamic environments. Through these contributions and experimental results, the paper demonstrates the effectiveness and superiority of SAMOCP in dealing with the problem of data distribution changes in dynamic environments.