Abstract:Detection of out-of-distribution (OOD) samples is crucial for safe real-world deployment of machine learning models. Recent advances in vision language foundation models have made them capable of detecting OOD samples without requiring in-distribution (ID) images. However, these zero-shot methods often underperform as they do not adequately consider ID class likelihoods in their detection confidence scoring. Hence, we introduce CLIPScope, a zero-shot OOD detection approach that normalizes the confidence score of a sample by class likelihoods, akin to a Bayesian posterior update. Furthermore, CLIPScope incorporates a novel strategy to mine OOD classes from a large lexical database. It selects class labels that are farthest and nearest to ID classes in terms of CLIP embedding distance to maximize coverage of OOD samples. We conduct extensive ablation studies and empirical evaluations, demonstrating state of the art performance of CLIPScope across various OOD detection benchmarks.

What problem does this paper attempt to address?

The paper primarily aims to address a critical issue encountered in the real-world deployment of machine learning models: how to effectively detect Out-of-Distribution (OOD) samples. Specifically, the paper proposes a new method called CLIPScope, which is a zero-shot OOD detection technique designed to enhance the detection capability of OOD samples by introducing Bayesian inference. ### Overview of the Problem Addressed by the Paper - **Background and Challenges**: Machine learning systems typically assume that the test data will have the same distribution as the training data. However, in practical applications, models may encounter OOD data that was not present in the training set. Traditional OOD detection methods often focus solely on image data and perform poorly in zero-shot settings because they do not adequately consider the likelihood of known categories. - **Proposed Method**: The CLIPScope method proposed in the paper leverages Bayesian inference to update the confidence scores of samples, thereby enhancing the detection of OOD samples. This method is based on the CLIP (Contrastive Language-Image Pre-training) model, a powerful vision-language foundation model. CLIPScope adjusts the confidence scores of samples being classified into various categories, ensuring that OOD samples in high-frequency categories receive lower scores. - **Innovations**: - Introduces Bayesian inference to dynamically adjust the confidence scores of OOD samples, thereby improving detection accuracy. - Proposes a new strategy to mine potential OOD labels from large lexical databases (such as WordNet), considering both the nearest and farthest words from known categories to maximize the coverage of the OOD sample space. - The method does not rely on additional training data or complex preprocessing steps but fully utilizes existing resources (such as WordNet), making CLIPScope more efficient and easier to implement. ### Overview of Experimental Results - **Experimental Setup**: The paper uses ImageNet-1K as the benchmark in-distribution dataset and multiple other datasets (such as iNaturalist, SUN, Places, and Textures) as OOD datasets for evaluation. - **Evaluation Metrics**: The performance is measured using two standard metrics: AUROC (Area Under the Receiver Operating Characteristic Curve) and FPR95 (False Positive Rate at 95% True Positive Rate). - **Comparison Methods**: The paper compares CLIPScope not only with other zero-shot OOD detection methods (such as Mahalanobis distance, energy score, ZOC, MCM, CLIPN, and NegLabel) but also with OOD detection methods that require training (such as MSP, ODIN, GradNorm, etc.). - **Performance**: According to the results shown in Table 1, CLIPScope achieves significantly better performance than other methods on all tested OOD datasets, particularly excelling in both AUROC and FPR95 metrics, indicating its high accuracy and reliability in OOD detection.

CLIPScope: Enhancing Zero-Shot OOD Detection with Bayesian Scoring

Zero-Shot Out-of-Distribution Detection Based on the Pre-trained Model CLIP

Zero-Shot Out-of-Distribution Detection with Outlier Label Exposure

CLIPN for Zero-Shot OOD Detection: Teaching CLIP to Say No

CLIP-driven Outliers Synthesis for few-shot OOD detection

Zero-Shot In-Distribution Detection in Multi-Object Settings Using Vision-Language Foundation Models

COOD: Concept-based Zero-shot OOD Detection

Conjugated Semantic Pool Improves OOD Detection with Pre-trained Vision-Language Models

CLIPood: Generalizing CLIP to Out-of-Distributions

Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection

How Does Fine-Tuning Impact Out-of-Distribution Detection for Vision-Language Models?

CLIP-AD: A Language-Guided Staged Dual-Path Model for Zero-shot Anomaly Detection

Matching Words for Out-of-distribution Detection

Exploiting CLIP for Zero-shot HOI Detection Requires Knowledge Distillation at Multiple Levels

Zero-shot Object-Level OOD Detection with Context-Aware Inpainting

PerceptionCLIP: Visual Classification by Inferring and Conditioning on Contexts

Category-Extensible Out-of-Distribution Detection via Hierarchical Context Descriptions

ID-like Prompt Learning for Few-Shot Out-of-Distribution Detection

Self-Calibrated Tuning of Vision-Language Models for Out-of-Distribution Detection

CLIP4HOI: Towards Adapting CLIP for Practical Zero-Shot HOI Detection.

Out-Of-Distribution Detection for Audio-visual Generalized Zero-Shot Learning: A General Framework