URL: A Representation Learning Benchmark for Transferable Uncertainty Estimates

Michael Kirchhof,Bálint Mucsányi,Seong Joon Oh,Enkelejda Kasneci
2023-10-19
Abstract:Representation learning has significantly driven the field to develop pretrained models that can act as a valuable starting point when transferring to new datasets. With the rising demand for reliable machine learning and uncertainty quantification, there is a need for pretrained models that not only provide embeddings but also transferable uncertainty estimates. To guide the development of such models, we propose the Uncertainty-aware Representation Learning (URL) benchmark. Besides the transferability of the representations, it also measures the zero-shot transferability of the uncertainty estimate using a novel metric. We apply URL to evaluate eleven uncertainty quantifiers that are pretrained on ImageNet and transferred to eight downstream datasets. We find that approaches that focus on the uncertainty of the representation itself or estimate the prediction risk directly outperform those that are based on the probabilities of upstream classes. Yet, achieving transferable uncertainty quantification remains an open challenge. Our findings indicate that it is not necessarily in conflict with traditional representation learning goals. Code is provided under <a class="link-external link-https" href="https://github.com/mkirchhof/url" rel="external noopener nofollow">this https URL</a> .
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is: how to develop pre - trained models that can provide transferable uncertainty estimates. Specifically, the author points out that current machine - learning models often lack the ability to quantify uncertainty while providing reliable predictions, especially when facing new datasets. To promote progress in this field, the author proposes a new benchmark - Uncertainty - aware Representation Learning (URL) benchmark, which aims to evaluate the transfer ability of pre - trained models not only in representation learning but also in uncertainty estimation. ### Problem Background 1. **Success of Representation Learning**: In recent years, the development of representation learning has made pre - trained models a powerful starting point for many downstream tasks. These models can achieve zero - shot or few - shot transfer on new datasets by pre - training on large - scale datasets. 2. **Need for Uncertainty Quantification**: With the increasing demand for reliable machine learning, especially in high - risk applications (such as medical image classification), models need to be able to quantify the uncertainty of their predictions. Uncertainty quantification can help models avoid making wrong predictions in uncertain situations. 3. **Existing Challenges**: Although representation learning has made significant progress, there is currently no effective method to ensure that the uncertainty estimates of pre - trained models can be well transferred to new datasets. Most existing benchmarks only evaluate uncertainty on the training dataset and do not consider its performance on unseen datasets. ### Solution To solve the above problems, the author proposes the URL benchmark, which has the following characteristics: - **Evaluating the Transfer Ability of Representation and Uncertainty**: URL not only evaluates the representation quality of pre - trained models on new datasets (through Recall@1), but also introduces a new metric - Recall@1 AUROC (R - AUROC) to evaluate the transfer ability of uncertainty estimation. - **Experimental Design**: The author uses ImageNet as the upstream dataset, selects eight downstream datasets, and evaluates eleven different uncertainty quantification methods. In this way, they hope to reveal which methods can better maintain the quality of uncertainty estimation during the transfer process. ### Main Findings 1. **Transferring Uncertainty Estimation Remains an Unsolved Challenge**: Even the best - performing methods have much lower uncertainty estimation performance on new datasets than models trained with multiple samples. 2. **Some Methods Perform Well**: For example, MCInfoNCE and direct loss prediction (Loss Prediction) perform better on multiple evaluation metrics. 3. **Uncertainty Estimation and Representation Learning Do Not Necessarily Conflict**: Some models maintain good representation - learning performance while improving uncertainty estimation. 4. **Upstream Performance Does Not Equal Downstream Performance**: The uncertainty estimation performance of a model on the upstream dataset does not predict its performance on the downstream dataset well. 5. **URL Captures the Degree of Alignment between Model and Human Uncertainty**: By comparing with the uncertainty of human annotators, the author finds that R - AUROC can reflect the degree of alignment between the model and human uncertainty. ### Conclusion The author hopes that through the URL benchmark, research on pre - trained models in uncertainty estimation can be promoted, so that they can not only perform well in representation learning, but also provide reliable uncertainty estimates when facing new data. This will help improve the reliability and robustness of machine - learning models in practical applications.