Abstract:One of the early weaknesses identified in deep neural networks trained for image classification tasks, was their inability to provide low confidence predictions on out-of-distribution (OOD) data, that was significantly different from the in-distribution (ID) data used to train them. Representation learning, where neural networks are trained in specific ways that improve their ability to detect OOD examples, has emerged as a promising direction to solving this problem. However, these approaches require long training times, and can be computationally inefficient at detecting OOD examples. Recent developments in Vision Transformer (ViT) foundation models$\unicode{x2013}$large networks trained on large and diverse datasets with self-supervised approaches$\unicode{x2013}$also show strong performance in OOD detection, and could potentially address some of these challenges. This paper presents Mixture of Exemplars (MoLAR), an approach that provides a unified way of tackling OOD detection challenges in both supervised and semi-supervised settings$\unicode{x2013}$that is designed to be trained with a frozen, pretrained foundation model backbone. MoLAR is efficient to train, and provides strong OOD performance when only comparing the distance of OOD examples to the exemplars, a small set of images chosen to be representative of the dataset. As a result, determining if an image is OOD with MoLAR is no more expensive than classifying an image. Extensive experiments demonstrate the superior OOD detection performance of MoLAR in comparison to comparable approaches, and also the strong performance of MoLAR in semi-supervised settings.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in image classification tasks, deep neural networks are unable to provide low - confidence predictions for out - of - distribution (OOD) data that are significantly different from the training data. Specifically, the paper proposes a method named Mixture of Exemplars (MoLAR), aiming to provide a unified way to efficiently detect OOD data in supervised and semi - supervised settings. MoLAR is designed to be used with the backbone of a frozen base model. It determines whether a sample is an OOD sample by comparing the distance between the OOD sample and a set of representative samples (called "exemplars"), thus not adding extra computational burden when detecting OOD data. ### Background of the Paper One weakness of early deep neural networks was that they had difficulty providing low - confidence predictions when dealing with OOD data significantly different from the training set. In recent years, representation learning methods (such as PALM and CIDER) have been proposed to improve this problem, but these methods usually require long training times and high computational costs. Moreover, the performance of these methods when using large pre - trained models (such as Vision Transformer, ViT) is also not satisfactory. ### Contributions of the Paper 1. **Proposing MoLAR**: MoLAR is a new OOD detection method that can efficiently detect OOD data in supervised and semi - supervised settings. It uses a set of representative samples (exemplars) to define the centers of vMF components in the mixture model and determines whether a sample is an OOD sample by calculating the distance between the sample and these exemplars. 2. **Efficiency**: The design of MoLAR makes the cost of detecting OOD data the same as that of making classification predictions, thus having an advantage in efficiency. 3. **Extensive Experimental Verification**: Through the OpenOOD benchmark test, the paper shows the superior performance of MoLAR on multiple datasets. Especially in the semi - supervised setting, MoLAR - SS (the semi - supervised version of MoLAR) shows OOD detection performance comparable to or even better than that of supervised methods. ### Method Overview - **MoLAR in Supervised Settings**: - Defines a mixture model based on the von Mises - Fisher (vMF) distribution, where the exemplars of each category define the centers of the vMF components. - Uses Bayes' rule for classification and optimizes model parameters by maximizing likelihood estimation. - **MoLAR - SS in Semi - supervised Settings**: - In the absence of labels, provides a strong supervision signal through multi - view augmentation and average sharpening prediction. - Uses Kullback - Leibler (KL) divergence to minimize the difference in probability distributions between the target space and the original space to initialize the projection head. - **Exemplar Selection Strategy**: - Proposes a simple k - means exemplar selection strategy (SKMPS), which selects the most representative samples as exemplars through clustering. ### Experimental Results - **OOD Detection Performance**: The OOD detection performance of MoLAR on multiple datasets is better than that of existing methods. Especially in the semi - supervised setting with a small amount of labeled data, MoLAR - SS performs excellently. - **Computational Efficiency**: The computational cost of MoLAR in detecting OOD data is the same as that of classification prediction, which greatly improves efficiency. - **Semi - supervised Learning Performance**: MoLAR - SS performs excellently in multiple semi - supervised learning benchmark tests and is competitive compared with existing methods. In conclusion, through proposing the MoLAR method, this paper effectively solves the confidence problem of deep neural networks when dealing with OOD data, and at the same time shows efficient performance in both supervised and semi - supervised settings.

A Unified Approach to Semi-Supervised Out-of-Distribution Detection

Continual Unsupervised Out-of-Distribution Detection

TagOOD: A Novel Approach to Out-of-Distribution Detection via Vision-Language Representations and Class Center Learning

VI-OOD: A Unified Representation Learning Framework for Textual Out-of-distribution Detection

MOODv2: Masked Image Modeling for Out-of-Distribution Detection

Out-of-Distribution Detection Using Peer-Class Generated by Large Language Model

Exploring Large Language Models for Multi-Modal Out-of-Distribution Detection

PViT: Prior-augmented Vision Transformer for Out-of-distribution Detection

Delving into Out-of-Distribution Detection with Vision-Language Representations

Resultant: Incremental Effectiveness on Likelihood for Unsupervised Out-of-Distribution Detection

Matching Words for Out-of-distribution Detection

Unveiling the unseen: novel strategies for object detection beyond known distributions

Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey

Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection

Out-of-Distribution Detection via Deep Multi-Comprehension Ensemble

Calibrated Out-of-Distribution Detection with a Generic Representation

Mahalanobis-Aware Training for Out-of-Distribution Detection

Language-Enhanced Latent Representations for Out-of-Distribution Detection in Autonomous Driving

Learning by Erasing: Conditional Entropy based Transferable Out-Of-Distribution Detection

Detecting Out-of-Distribution Examples Via Class-Conditional Impressions Reappearing

Class Relevance Learning For Out-of-distribution Detection