Abstract:Out-of-distribution (OOD) detection aims to detect "unknown" data whose labels have not been seen during the in-distribution (ID) training process. Recent progress in representation learning gives rise to distance-based OOD detection that recognizes inputs as ID/OOD according to their relative distances to the training data of ID classes. Previous approaches calculate pairwise distances relying only on global image representations, which can be sub-optimal as the inevitable background clutter and intra-class variation may drive image-level representations from the same ID class far apart in a given representation space. In this work, we overcome this challenge by proposing Multi-scale OOD DEtection (MODE), a first framework leveraging both global visual information and local region details of images to maximally benefit OOD detection. Specifically, we first find that existing models pretrained by off-the-shelf cross-entropy or contrastive losses are incompetent to capture valuable local representations for MODE, due to the scale-discrepancy between the ID training and OOD detection processes. To mitigate this issue and encourage locally discriminative representations in ID training, we propose Attention-based Local PropAgation (ALPA), a trainable objective that exploits a cross-attention mechanism to align and highlight the local regions of the target objects for pairwise examples. During test-time OOD detection, a Cross-Scale Decision (CSD) function is further devised on the most discriminative multi-scale representations to distinguish ID/OOD data more faithfully. We demonstrate the effectiveness and flexibility of MODE on several benchmarks -- on average, MODE outperforms the previous state-of-the-art by up to 19.24% in FPR, 2.77% in AUROC. Code is available at <a class="link-external link-https" href="https://github.com/JimZAI/MODE-OOD" rel="external noopener nofollow">this https URL</a>.

Delving into Out-of-Distribution Detection with Vision-Language Representations

TagOOD: A Novel Approach to Out-of-Distribution Detection via Vision-Language Representations and Class Center Learning

Exploring Large Language Models for Multi-Modal Out-of-Distribution Detection

COOD: Concept-based Zero-shot OOD Detection

General-Purpose Multi-Modal OOD Detection Framework

Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey

MultiOOD: Scaling Out-of-Distribution Detection for Multiple Modalities

How Does Fine-Tuning Impact Out-of-Distribution Detection for Vision-Language Models?

Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection

MOODv2: Masked Image Modeling for Out-of-Distribution Detection

Language-Enhanced Latent Representations for Out-of-Distribution Detection in Autonomous Driving

CMG: A Class-Mixed Generation Approach to Out-of-Distribution Detection

Out-of-Distribution Detection via Deep Multi-Comprehension Ensemble

Chain of Visual Perception: Harnessing Multimodal Large Language Models for Zero-shot Camouflaged Object Detection

From Global to Local: Multi-scale Out-of-distribution Detection

A Unified Approach to Semi-Supervised Out-of-Distribution Detection

Enhancing Out-of-Distribution Detection with Multitesting-based Layer-wise Feature Fusion

Learning Multi-Manifold Embedding for Out-Of-Distribution Detection

Conjugated Semantic Pool Improves OOD Detection with Pre-trained Vision-Language Models

Matching Words for Out-of-distribution Detection

Out-of-Distribution Detection Using Peer-Class Generated by Large Language Model