Abstract:The crux of effective out-of-distribution (OOD) detection lies in acquiring a robust in-distribution (ID) representation, distinct from OOD samples. While previous methods predominantly leaned on recognition-based techniques for this purpose, they often resulted in shortcut learning, lacking comprehensive representations. In our study, we conducted a comprehensive analysis, exploring distinct pretraining tasks and employing various OOD score functions. The results highlight that the feature representations pre-trained through reconstruction yield a notable enhancement and narrow the performance gap among various score functions. This suggests that even simple score functions can rival complex ones when leveraging reconstruction-based pretext tasks. Reconstruction-based pretext tasks adapt well to various score functions. As such, it holds promising potential for further expansion. Our OOD detection framework, MOODv2, employs the masked image modeling pretext task. Without bells and whistles, MOODv2 impressively enhances 14.30% AUROC to 95.68% on ImageNet and achieves 99.98% on CIFAR-10.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to obtain a powerful feature representation that can distinguish in - distribution (ID) data from out - of - distribution data in anomaly detection or out - of - distribution (OOD) detection. Most traditional OOD detection methods rely on recognition - based techniques, which often lead to shortcut learning and lack comprehensive feature representation. Therefore, the paper improves this problem by introducing a new pre - training task - Masked Image Modeling (MIM). ### Main contributions of the paper 1. **Proposing a new pre - training task**: The paper proposes using Masked Image Modeling (MIM) as a pre - training task to improve the performance of OOD detection. The MIM task randomly masks a part of the image, making the model learn from the remaining part and infer the masked part, thereby reconstructing the image. This method forces the model to learn pixel - level feature representation instead of just learning patterns in classification. 2. **Verifying the effectiveness of MIM**: The paper verifies the effectiveness of the MIM pre - training model in OOD detection through experiments. The results show that the MIM pre - training model significantly improves the AUROC (Area Under the Receiver Operating Characteristic Curve) metric on multiple OOD datasets, especially on the ImageNet and CIFAR - 10 datasets. 3. **Analyzing the performance of different score functions**: The paper explores the performance of different OOD score functions (such as probability - based, logit - based, feature - based, and hybrid methods) under different pre - training tasks. The results show that when using the MIM pre - training model, even simple score functions can be comparable to complex score functions. 4. **Proposing the MOODv2 framework**: Based on the above research, the paper proposes a new OOD detection framework - MOODv2 (Masked Image Modeling for Out - of - Distribution Detection v2). This framework achieves significant performance improvement by using the MIM pre - training model and combining feature and logit score functions. ### Main findings - **Advantages of MIM pre - training**: The MIM pre - training model performs well in OOD detection, especially when dealing with natural and non - natural images, and can effectively distinguish ID and OOD data. - **Selection of score functions**: The experimental results show that score functions that combine features and logits (such as ViM) perform best in most cases. - **Generalization ability**: The MIM pre - training model performs well on multiple OOD datasets, indicating that it has good generalization ability. ### Conclusion By introducing the MIM pre - training task, the paper significantly improves the performance of OOD detection, especially when dealing with complex and diverse OOD data. This provides a new and effective method for the OOD detection field and is expected to promote further development in this field.

MOODv2: Masked Image Modeling for Out-of-Distribution Detection

Rethinking Out-of-distribution (OOD) Detection: Masked Image Modeling is All You Need

Unleashing Mask: Explore the Intrinsic Out-of-Distribution Detection Capability

Classifier-head Informed Feature Masking and Prototype-based Logit Smoothing for Out-of-Distribution Detection

From Global to Local: Multi-scale Out-of-distribution Detection

MIM-OOD: Generative Masked Image Modelling for Out-of-Distribution Detection in Medical Images

Advancing Out-of-Distribution Detection through Data Purification and Dynamic Activation Function Design

Matching Words for Out-of-distribution Detection

Image Background Serves as Good Proxy for Out-of-distribution Data

TagOOD: A Novel Approach to Out-of-Distribution Detection via Vision-Language Representations and Class Center Learning

Delving into Out-of-Distribution Detection with Vision-Language Representations

Rethinking Out-of-Distribution Detection on Imbalanced Data Distribution

Diffusion Denoising Process for Perceptron Bias in Out-of-distribution Detection

Out-of-Distribution Detection Using Peer-Class Generated by Large Language Model

SR-OOD: Out-of-Distribution Detection via Sample Repairing

MultiOOD: Scaling Out-of-Distribution Detection for Multiple Modalities

Mitigating Overconfidence in Out-of-Distribution Detection by Capturing Extreme Activations

Unveiling the unseen: novel strategies for object detection beyond known distributions

Rethinking Out-of-Distribution Detection From a Human-Centric Perspective

Learning by Erasing: Conditional Entropy based Transferable Out-Of-Distribution Detection