Abstract:Out-of-distribution (OOD) detection is crucial for model reliability, as it identifies samples from unknown classes and reduces errors due to unexpected inputs. Vision-Language Models (VLMs) such as CLIP are emerging as powerful tools for OOD detection by integrating multi-modal information. However, the practical application of such systems is challenged by manual prompt engineering, which demands domain expertise and is sensitive to linguistic nuances. In this paper, we introduce Label-driven Automated Prompt Tuning (LAPT), a novel approach to OOD detection that reduces the need for manual prompt engineering. We develop distribution-aware prompts with in-distribution (ID) class names and negative labels mined automatically. Training samples linked to these class labels are collected autonomously via image synthesis and retrieval methods, allowing for prompt learning without manual effort. We utilize a simple cross-entropy loss for prompt optimization, with cross-modal and cross-distribution mixing strategies to reduce image noise and explore the intermediate space between distributions, respectively. The LAPT framework operates autonomously, requiring only ID class names as input and eliminating the need for manual intervention. With extensive experiments, LAPT consistently outperforms manually crafted prompts, setting a new standard for OOD detection. Moreover, LAPT not only enhances the distinction between ID and OOD samples, but also improves the ID classification accuracy and strengthens the generalization robustness to covariate shifts, resulting in outstanding performance in challenging full-spectrum OOD detection tasks. Codes are available at \url{<a class="link-external link-https" href="https://github.com/YBZh/LAPT" rel="external noopener nofollow">this https URL</a>}.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the need for manual prompt engineering when using Vision - Language Models (VLMs) for Out - of - Distribution (OOD) detection. Specifically, traditional OOD detection methods face challenges when dealing with data of unknown classes, especially when human - designed prompt words are required to optimize model performance. The design of these prompt words not only requires professional knowledge but is also very sensitive to slight language changes, which increases the difficulty in practical applications. To solve this problem, the paper proposes a new method named "Label - driven Automated Prompt Tuning (LAPT)". LAPT reduces the need for manual prompt engineering in the following ways: 1. **Distribution - aware Prompts**: Given the names of known classes and automatically extracted negative labels, LAPT will attach distribution - aware prompt tokens to these class labels. These learnable tokens collect training samples related to class labels by pre - trained text - to - image generation models or retrieving real images from large - scale network datasets, thereby achieving prompt learning. 2. **Simple and Effective Cross - Entropy Loss Function**: LAPT uses a simple cross - entropy loss function for prompt optimization and enhances the optimization process through cross - modal and cross - distribution data mixing techniques. Cross - modal mixing reduces image noise by combining image and text features of the same class, while cross - distribution mixing randomly mixes the features of known classes and negative samples and their corresponding labels, exploring the intermediate space between known classes and negative samples. 3. **Automated Process**: The LAPT framework can run independently, only requiring the input of the names of known classes without human intervention. This method not only improves the performance of OOD detection but also enhances the accuracy of known - class classification and improves the generalization robustness to covariate shift. Through extensive experiments, LAPT significantly outperforms manually - designed prompt words in multiple OOD detection benchmark tests and achieves new best results in approximate OOD detection tasks without the need for manual annotation or prompt word design. In addition, LAPT not only enhances the discrimination between known classes and OOD samples but also improves the classification accuracy of known classes and the generalization robustness to covariate shift. These improvements together make LAPT perform excellently in challenging full - spectrum OOD detection tasks.

LAPT: Label-driven Automated Prompt Tuning for OOD Detection with Vision-Language Models

Out-of-Distribution Detection with Negative Prompts

Self-Calibrated Tuning of Vision-Language Models for Out-of-Distribution Detection

Negative Label Guided OOD Detection with Pretrained Vision-Language Models

Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection

Enhancing Outlier Knowledge for Few-Shot Out-of-Distribution Detection with Extensible Local Prompts

ID-like Prompt Learning for Few-Shot Out-of-Distribution Detection

Adversarial Prompt Distillation for Vision-Language Models

How Does Fine-Tuning Impact Out-of-Distribution Detection for Vision-Language Models?

AdaNeg: Adaptive Negative Proxy Guided OOD Detection with Vision-Language Models

Learning Transferable Negative Prompts for Out-of-Distribution Detection

LoCoOp: Few-Shot Out-of-Distribution Detection via Prompt Learning

DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection

Zero-Shot Out-of-Distribution Detection with Outlier Label Exposure

DeCoOp: Robust Prompt Tuning with Out-of-Distribution Detection

Why Is Prompt Tuning for Vision-Language Models Robust to Noisy Labels?

Scene-adaptive and Region-aware Multi-modal Prompt for Open Vocabulary Object Detection

Human-Free Automated Prompting for Vision-Language Anomaly Detection: Prompt Optimization with Meta-guiding Prompt Scheme

Weak Distribution Detectors Lead to Stronger Generalizability of Vision-Language Prompt Tuning

Learning Background Prompts to Discover Implicit Knowledge for Open Vocabulary Object Detection

CLIPN for Zero-Shot OOD Detection: Teaching CLIP to Say No