Abstract:Out-of-distribution (OOD) detection is crucial in many real-world applications. However, intelligent models are often trained solely on in-distribution (ID) data, leading to overconfidence when misclassifying OOD data as ID classes. In this study, we propose a new learning framework which leverage simple Jigsaw-based fake OOD data and rich semantic embeddings (`anchors') from the ChatGPT description of ID knowledge to help guide the training of the image encoder. The learning framework can be flexibly combined with existing post-hoc approaches to OOD detection, and extensive empirical evaluations on multiple OOD detection benchmarks demonstrate that rich textual representation of ID knowledge and fake OOD knowledge can well help train a visual encoder for OOD detection. With the learning framework, new state-of-the-art performance was achieved on all the benchmarks. The code is available at \url{<a class="link-external link-https" href="https://github.com/Cverchen/TagFog" rel="external noopener nofollow">this https URL</a>}.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: in real - world applications, intelligent models often encounter samples with different training data distributions during deployment (i.e., out - of - distribution, OOD samples). These OOD samples usually come from unknown classes and did not appear during model training. Incorrectly classifying OOD samples as known in - distribution (ID) classes may lead to serious consequences, for example, in application scenarios such as autonomous driving and intelligent healthcare. Therefore, accurately detecting whether new data are OOD samples or belong to known classes is crucial for AI models. Specifically, this research proposes a new learning framework TagFog (Textual Anchor Guidance and Fake Outlier Generation for Visual Out - of - Distribution Detection), aiming to improve OOD detection in the following ways: 1. **Generate fake OOD data**: Use simple Jigsaw transformation to generate fake OOD data to help the model better distinguish between ID and real OOD data. 2. **Utilize text anchor guidance**: Generate descriptions for each ID category through ChatGPT and input them into the pre - trained CLIP text encoder to obtain richer semantic embeddings as anchors to guide the training of the image encoder. By combining these two methods, the TagFog framework can achieve state - of - the - art performance on multiple OOD detection benchmarks, thereby effectively improving the model's ability to detect OOD samples. ### Formula Summary - **Cross - entropy loss function \( L_{CE} \)**: \[ L_{CE}=-\frac{1}{N + M}\sum_{i = 1}^{N + M}\sum_{k = 1}^{K + 1}y_{i,k}\log(\hat{y}_{i,k}) \] where \( N \) and \( M \) are the numbers of all ID training images and fake OOD images respectively, \(\hat{y}_{i,k}\) is the output probability that the \( i\) - th training image belongs to the \( k\) - th category, and \( y_{i,k}\) is the corresponding ground - truth output (0 or 1). - **Contrastive loss \( L_{CI} \)**: \[ L_{CI}=-\frac{1}{N}\sum_{n = 1}^N\sum_{k = 1}^K1(y_{n,k}\neq0)\cdot\log\left(\frac{\exp(s(z_n,\mu_k)/\tau)}{\sum_{j = 1}^K\exp(s(z_n,\mu_j)/\tau)}\right) \] where \( z_n = g(f(x_n))\) is the projected visual embedding of the input ID image \( x_n\), \( s(z_n,\mu_k)\) represents the cosine similarity between the two embeddings, \( 1(\cdot)\) is the indicator function, and \(\tau\) is the temperature scaling factor. - **Supervised contrastive loss \( L_{SC} \)**: \[ L_{SC}=-\frac{1}{S}\sum_{i = 1}^S\frac{1}{|P(i)|}\sum_{p\in P(i)}\log\left(\frac{\exp(s(z_i,z_p)/\tau')}{\sum_{a\in A(i)}\exp(s(z_i,z_a)/\tau')}\right) \] where \( S = N + M\), \( A(i)\) represents all sample indices in the mini - batch containing the sample with index \( i\), \( P(i)\) is a subset of \( A(i)\) in which all corresponding samples share the same category label as the sample with index \( i\).

TagFog: Textual Anchor Guidance and Fake Outlier Generation for Visual Out-of-Distribution Detection

FodFoM: Fake Outlier Data by Foundation Models Creates Stronger Visual Out-of-Distribution Detector

TagOOD: A Novel Approach to Out-of-Distribution Detection via Vision-Language Representations and Class Center Learning

Out-of-Distribution Detection with Virtual Outlier Smoothing

OAL: Enhancing OOD Detection Using Latent Diffusion

Exploring using jigsaw puzzles for out-of-distribution detection

Rethinking Out-of-Distribution Detection on Imbalanced Data Distribution

Rethinking Out-of-distribution (OOD) Detection: Masked Image Modeling is All You Need

Can OOD Object Detectors Learn from Foundation Models?

Look Around and Find Out: OOD Detection with Relative Angles

Unleashing Mask: Explore the Intrinsic Out-of-Distribution Detection Capability

VI-OOD: A Unified Representation Learning Framework for Textual Out-of-distribution Detection

Out-of-Distribution Identification: Let Detector Tell Which I Am Not Sure.

Category-Extensible Out-of-Distribution Detection via Hierarchical Context Descriptions

CLIP-driven Outliers Synthesis for few-shot OOD detection

Pseudo-OOD training for robust language models

Zero-Shot Out-of-Distribution Detection with Outlier Label Exposure

Out-of-Distribution Detection Using Peer-Class Generated by Large Language Model

Going Beyond Conventional OOD Detection

Learning by Erasing: Conditional Entropy based Transferable Out-Of-Distribution Detection

GOOD-D: On Unsupervised Graph Out-Of-Distribution Detection