LAPT: Label-driven Automated Prompt Tuning for OOD Detection with Vision-Language Models

Yabin Zhang,Wenjie Zhu,Chenhang He,Lei Zhang
2024-07-12
Abstract:Out-of-distribution (OOD) detection is crucial for model reliability, as it identifies samples from unknown classes and reduces errors due to unexpected inputs. Vision-Language Models (VLMs) such as CLIP are emerging as powerful tools for OOD detection by integrating multi-modal information. However, the practical application of such systems is challenged by manual prompt engineering, which demands domain expertise and is sensitive to linguistic nuances. In this paper, we introduce Label-driven Automated Prompt Tuning (LAPT), a novel approach to OOD detection that reduces the need for manual prompt engineering. We develop distribution-aware prompts with in-distribution (ID) class names and negative labels mined automatically. Training samples linked to these class labels are collected autonomously via image synthesis and retrieval methods, allowing for prompt learning without manual effort. We utilize a simple cross-entropy loss for prompt optimization, with cross-modal and cross-distribution mixing strategies to reduce image noise and explore the intermediate space between distributions, respectively. The LAPT framework operates autonomously, requiring only ID class names as input and eliminating the need for manual intervention. With extensive experiments, LAPT consistently outperforms manually crafted prompts, setting a new standard for OOD detection. Moreover, LAPT not only enhances the distinction between ID and OOD samples, but also improves the ID classification accuracy and strengthens the generalization robustness to covariate shifts, resulting in outstanding performance in challenging full-spectrum OOD detection tasks. Codes are available at \url{<a class="link-external link-https" href="https://github.com/YBZh/LAPT" rel="external noopener nofollow">this https URL</a>}.
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the need for manual prompt engineering when using Vision - Language Models (VLMs) for Out - of - Distribution (OOD) detection. Specifically, traditional OOD detection methods face challenges when dealing with data of unknown classes, especially when human - designed prompt words are required to optimize model performance. The design of these prompt words not only requires professional knowledge but is also very sensitive to slight language changes, which increases the difficulty in practical applications. To solve this problem, the paper proposes a new method named "Label - driven Automated Prompt Tuning (LAPT)". LAPT reduces the need for manual prompt engineering in the following ways: 1. **Distribution - aware Prompts**: Given the names of known classes and automatically extracted negative labels, LAPT will attach distribution - aware prompt tokens to these class labels. These learnable tokens collect training samples related to class labels by pre - trained text - to - image generation models or retrieving real images from large - scale network datasets, thereby achieving prompt learning. 2. **Simple and Effective Cross - Entropy Loss Function**: LAPT uses a simple cross - entropy loss function for prompt optimization and enhances the optimization process through cross - modal and cross - distribution data mixing techniques. Cross - modal mixing reduces image noise by combining image and text features of the same class, while cross - distribution mixing randomly mixes the features of known classes and negative samples and their corresponding labels, exploring the intermediate space between known classes and negative samples. 3. **Automated Process**: The LAPT framework can run independently, only requiring the input of the names of known classes without human intervention. This method not only improves the performance of OOD detection but also enhances the accuracy of known - class classification and improves the generalization robustness to covariate shift. Through extensive experiments, LAPT significantly outperforms manually - designed prompt words in multiple OOD detection benchmark tests and achieves new best results in approximate OOD detection tasks without the need for manual annotation or prompt word design. In addition, LAPT not only enhances the discrimination between known classes and OOD samples but also improves the classification accuracy of known classes and the generalization robustness to covariate shift. These improvements together make LAPT perform excellently in challenging full - spectrum OOD detection tasks.