MOODv2: Masked Image Modeling for Out-of-Distribution Detection

Jingyao Li,Pengguang Chen,Shaozuo Yu,Shu Liu,Jiaya Jia
2024-01-05
Abstract:The crux of effective out-of-distribution (OOD) detection lies in acquiring a robust in-distribution (ID) representation, distinct from OOD samples. While previous methods predominantly leaned on recognition-based techniques for this purpose, they often resulted in shortcut learning, lacking comprehensive representations. In our study, we conducted a comprehensive analysis, exploring distinct pretraining tasks and employing various OOD score functions. The results highlight that the feature representations pre-trained through reconstruction yield a notable enhancement and narrow the performance gap among various score functions. This suggests that even simple score functions can rival complex ones when leveraging reconstruction-based pretext tasks. Reconstruction-based pretext tasks adapt well to various score functions. As such, it holds promising potential for further expansion. Our OOD detection framework, MOODv2, employs the masked image modeling pretext task. Without bells and whistles, MOODv2 impressively enhances 14.30% AUROC to 95.68% on ImageNet and achieves 99.98% on CIFAR-10.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to obtain a powerful feature representation that can distinguish in - distribution (ID) data from out - of - distribution data in anomaly detection or out - of - distribution (OOD) detection. Most traditional OOD detection methods rely on recognition - based techniques, which often lead to shortcut learning and lack comprehensive feature representation. Therefore, the paper improves this problem by introducing a new pre - training task - Masked Image Modeling (MIM). ### Main contributions of the paper 1. **Proposing a new pre - training task**: The paper proposes using Masked Image Modeling (MIM) as a pre - training task to improve the performance of OOD detection. The MIM task randomly masks a part of the image, making the model learn from the remaining part and infer the masked part, thereby reconstructing the image. This method forces the model to learn pixel - level feature representation instead of just learning patterns in classification. 2. **Verifying the effectiveness of MIM**: The paper verifies the effectiveness of the MIM pre - training model in OOD detection through experiments. The results show that the MIM pre - training model significantly improves the AUROC (Area Under the Receiver Operating Characteristic Curve) metric on multiple OOD datasets, especially on the ImageNet and CIFAR - 10 datasets. 3. **Analyzing the performance of different score functions**: The paper explores the performance of different OOD score functions (such as probability - based, logit - based, feature - based, and hybrid methods) under different pre - training tasks. The results show that when using the MIM pre - training model, even simple score functions can be comparable to complex score functions. 4. **Proposing the MOODv2 framework**: Based on the above research, the paper proposes a new OOD detection framework - MOODv2 (Masked Image Modeling for Out - of - Distribution Detection v2). This framework achieves significant performance improvement by using the MIM pre - training model and combining feature and logit score functions. ### Main findings - **Advantages of MIM pre - training**: The MIM pre - training model performs well in OOD detection, especially when dealing with natural and non - natural images, and can effectively distinguish ID and OOD data. - **Selection of score functions**: The experimental results show that score functions that combine features and logits (such as ViM) perform best in most cases. - **Generalization ability**: The MIM pre - training model performs well on multiple OOD datasets, indicating that it has good generalization ability. ### Conclusion By introducing the MIM pre - training task, the paper significantly improves the performance of OOD detection, especially when dealing with complex and diverse OOD data. This provides a new and effective method for the OOD detection field and is expected to promote further development in this field.