Abstract:As vision-language models like CLIP are widely applied to zero-shot tasks and gain remarkable performance on in-distribution (ID) data, detecting and rejecting out-of-distribution (OOD) inputs in the zero-shot setting have become crucial for ensuring the safety of using such models on the fly. Most existing zero-shot OOD detectors rely on ID class label-based prompts to guide CLIP in classifying ID images and rejecting OOD images. In this work we instead propose to leverage a large set of diverse auxiliary outlier class labels as pseudo OOD class text prompts to CLIP for enhancing zero-shot OOD detection, an approach we called Outlier Label Exposure (OLE). The key intuition is that ID images are expected to have lower similarity to these outlier class prompts than OOD images. One issue is that raw class labels often include noise labels, e.g., synonyms of ID labels, rendering raw OLE-based detection ineffective. To address this issue, we introduce an outlier prototype learning module that utilizes the prompt embeddings of the outlier labels to learn a small set of pivotal outlier prototypes for an embedding similarity-based OOD scoring. Additionally, the outlier classes and their prototypes can be loosely coupled with the ID classes, leading to an inseparable decision region between them. Thus, we also introduce an outlier label generation module that synthesizes our outlier prototypes and ID class embeddings to generate in-between outlier prototypes to further calibrate the detection in OLE. Despite its simplicity, extensive experiments show that OLE substantially improves detection performance and achieves new state-of-the-art performance in large-scale OOD and hard OOD detection benchmarks.

What problem does this paper attempt to address?

The paper primarily addresses a key issue in zero-shot (out-of-distribution, OOD) detection—how to effectively identify and reject data from unknown distributions without training on the target dataset. Specifically, the study proposes a new method called Outlier Label Exposure (OLE). ### Research Background and Motivation With the widespread application of vision-language models like CLIP in zero-shot tasks and their significant performance on in-distribution (ID) data, effective detection of OOD inputs has become crucial. Existing methods mostly rely on label prompts of ID categories to guide CLIP in classifying ID images and rejecting OOD images. However, these methods often have limitations, such as overconfident predictions due to insufficient knowledge of OOD samples. ### Solution The proposed OLE method leverages a large number of auxiliary outlier category labels as pseudo-OOD text prompts for vision-language models like CLIP to enhance zero-shot OOD detection capability. The core idea is that ID images should have lower similarity to these outlier category text prompt embeddings, while OOD images should have higher similarity. This approach amplifies the OOD score gap between ID and OOD images. ### Key Technical Points 1. **Outlier Prototype Learning (OPL)**: Learning key outlier prototypes from large-scale raw outlier category labels to reduce noise impact and improve detection accuracy. 2. **Hard Outlier Prototype Generation (HOPG)**: Generating "hard" outlier prototypes located between outlier prototypes and marginal ID categories to further calibrate detection results. ### Main Contributions 1. Proposes the OLE method, a cost-effective solution that utilizes easily accessible outlier category labels to provide vision-language models with knowledge about unknown samples. 2. Designs a novel outlier prototype learning module that compresses a large number of raw outlier category labels into a set of key outlier prototypes for efficient and accurate zero-shot detection tasks. 3. Introduces an outlier prototype generation module that generates outlier prototypes located between learned outlier prototypes and marginal ID category embeddings, further optimizing detection in OLE. 4. Experiments show that OLE significantly enhances the performance of the current best model CLIPN in large-scale OOD detection and hard OOD detection, achieving a new state-of-the-art level. ### Conclusion This paper addresses the key challenge in zero-shot OOD detection by introducing the OLE method, which effectively utilizes outlier category information to enhance the model's OOD detection capability. Through outlier prototype learning and generation techniques, this method not only simplifies the processing of outlier category labels but also improves detection accuracy and robustness.

Zero-Shot Out-of-Distribution Detection with Outlier Label Exposure

Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection

ID-like Prompt Learning for Few-Shot Out-of-Distribution Detection

Zero-Shot Out-of-Distribution Detection Based on the Pre-trained Model CLIP

CLIP-driven Outliers Synthesis for few-shot OOD detection

CLIPScope: Enhancing Zero-Shot OOD Detection with Bayesian Scoring

Out-of-Distribution Detection with Negative Prompts

CLIPN for Zero-Shot OOD Detection: Teaching CLIP to Say No

OAL: Enhancing OOD Detection Using Latent Diffusion

Enhancing Outlier Knowledge for Few-Shot Out-of-Distribution Detection with Extensible Local Prompts

Out-of-distribution Detection with Implicit Outlier Transformation

When an extra rejection class meets out-of-distribution detection in long-tailed image classification

COOD: Concept-based Zero-shot OOD Detection

FodFoM: Fake Outlier Data by Foundation Models Creates Stronger Visual Out-of-Distribution Detector

How Does Fine-Tuning Impact Out-of-Distribution Detection for Vision-Language Models?

LAPT: Label-driven Automated Prompt Tuning for OOD Detection with Vision-Language Models

Out-of-Distribution Detection Using Peer-Class Generated by Large Language Model

Out-of-Distribution Detection with Prototypical Outlier Proxy

Out-of-Distribution Learning with Human Feedback

A noisy elephant in the room: Is your out-of-distribution detector robust to label noise?

CLIPood: Generalizing CLIP to Out-of-Distributions