Self-Calibrated Tuning of Vision-Language Models for Out-of-Distribution Detection

Geng Yu,Jianing Zhu,Jiangchao Yao,Bo Han
2024-11-05
Abstract:Out-of-distribution (OOD) detection is crucial for deploying reliable machine learning models in open-world applications. Recent advances in CLIP-based OOD detection have shown promising results via regularizing prompt tuning with OOD features extracted from ID data. However, the irrelevant context mined from ID data can be spurious due to the inaccurate foreground-background decomposition, thus limiting the OOD detection performance. In this work, we propose a novel framework, namely, Self-Calibrated Tuning (SCT), to mitigate this problem for effective OOD detection with only the given few-shot ID data. Specifically, SCT introduces modulating factors respectively on the two components of the original learning objective. It adaptively directs the optimization process between the two tasks during training on data with different prediction uncertainty to calibrate the influence of OOD regularization, which is compatible with many prompt tuning based OOD detection methods. Extensive experiments and analyses have been conducted to characterize and demonstrate the effectiveness of the proposed SCT. The code is publicly available.
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively detect out - of - distribution (OOD) data when deploying reliable machine - learning models in open - world applications. Specifically, the paper focuses on how to improve the OOD detection performance based on pre - trained vision - language models (VLMs) through the Self - Calibrated Tuning (SCT) framework with only a small amount of in - distribution (ID) data. ### Background and Problems 1. **Importance of OOD Detection** - OOD detection is crucial for ensuring the reliability and safety of machine - learning models in the open - world, especially in critical applications such as autonomous driving or medical intelligence. - Deep neural networks (DNNs) tend to be over - confident when facing OOD data, which may lead to serious safety problems. 2. **Limitations of Existing Methods** - Existing CLIP - based OOD detection methods perform regularization by extracting background features from ID data as proxies for OOD data. - However, these methods rely on inaccurate foreground - background decomposition, resulting in the extracted background features that may be irrelevant or even wrong, thus limiting the performance of OOD detection. ### Main Contributions of the Paper 1. **Conceptual Contributions** - The authors study the problem of imperfect OOD features due to inaccurate foreground - background decomposition in prompt - tuning - based OOD detection methods. - They observe that ID data with different prediction uncertainties have different impacts on OOD regularization. 2. **Technical Contributions** - A new learning framework, Self - Calibrated Tuning (SCT), is proposed. This framework calibrates the impact of OOD features mined from different ID data by adaptively adjusting the weights between two tasks during the model optimization process. - SCT introduces modulation factors that act on two parts of the original learning objective, namely the classification task and the OOD regularization task. These modulation factors are dynamically adjusted according to the uncertainty of samples, making the model focus more on the classification task for low - confidence data and more on the OOD regularization task for high - confidence data. 3. **Empirical Contributions** - The effectiveness of SCT is verified through extensive experiments. The experimental results show that SCT significantly improves the OOD detection performance on the large - scale ImageNet - 1k benchmark, especially achieving a 3% improvement over the best - existing method in the FPR95 metric. - The authors also conduct various ablation experiments and further discussions to provide in - depth understanding of the method. ### Method Overview 1. **Preliminary Definitions** - The VLM - based OOD detection task is defined, where the ID distribution is defined by the ID categories specified by the downstream task. - Standard prompt - tuning methods and prompt - tuning - based OOD detection methods such as LoCoOp are introduced. 2. **Motivation** - The authors prove through experiments that the quality of OOD features extracted from ID data is highly correlated with the uncertainty of samples. As the uncertainty increases, the extracted OOD features become increasingly unreliable. - Such unreliable OOD features will damage the calibration ability and OOD detection performance of the model. Therefore, a new mechanism is needed to consider the uncertainty of samples to assist the model in learning from imperfect OOD features. 3. **Self - Calibrated Tuning (SCT)** - SCT adaptively adjusts the importance of the classification task and the OOD regularization task by introducing modulation factors. - When the model has a high prediction uncertainty for ID samples, SCT will focus more on the classification task and reduce the impact of invalid OOD features; when the model has a low prediction uncertainty for ID samples, SCT will focus more on the OOD regularization task and enhance the positive impact of useful ID - independent features. - In the specific implementation, SCT uses a linear function as a modulation factor to avoid introducing additional hyper - parameters. ### Experimental Results 1. **Main Results** - On ImageNet - 1k