Abstract:Out-of-distribution (OOD) detection is essential for ensuring the robustness of machine learning models by identifying samples that deviate from the training distribution. While traditional OOD detection has primarily focused on single-modality inputs, such as images, recent advances in multimodal models have demonstrated the potential of leveraging multiple modalities (e.g., video, optical flow, audio) to enhance detection performance. However, existing methods often overlook intra-class variability within in-distribution (ID) data, assuming that samples of the same class are perfectly cohesive and consistent. This assumption can lead to performance degradation, especially when prediction discrepancies are uniformly amplified across all samples. To address this issue, we propose Dynamic Prototype Updating (DPU), a novel plug-and-play framework for multimodal OOD detection that accounts for intra-class variations. Our method dynamically updates class center representations for each class by measuring the variance of similar samples within each batch, enabling adaptive adjustments. This approach allows us to amplify prediction discrepancies based on the updated class centers, thereby improving the model's robustness and generalization across different modalities. Extensive experiments on two tasks, five datasets, and nine base OOD algorithms demonstrate that DPU significantly improves OOD detection performance, setting a new state-of-the-art in multimodal OOD detection, with improvements of up to 80 percent in Far-OOD detection. To facilitate accessibility and reproducibility, our code is publicly available on GitHub.

What problem does this paper attempt to address?

### The Problem the Paper Attempts to Solve This paper aims to address the issue of Out-of-Distribution (OOD) detection in multimodal data. Specifically, existing OOD detection methods mainly focus on unimodal inputs (such as images), while multimodal models, although showing potential in improving detection performance by utilizing multiple modalities (such as video, optical flow, audio), often overlook the intra-class variability within In-Distribution (ID) data. This assumption can lead to performance degradation, especially when prediction differences are indiscriminately amplified across all samples. To tackle this challenge, the authors propose a Dynamic Prototype Updating (DPU) framework, a novel plug-and-play approach for multimodal OOD detection that can handle intra-class variability. DPU enhances the robustness and generalization ability of the model by dynamically updating the central representation of each class, adjusting according to the variance of similar samples within each batch. ### Main Contributions 1. **New Observation**: For the first time, the negative impact of intra-class variability within ID data on OOD detection is identified and discussed. 2. **Novel Model-Agnostic Framework**: A flexible DPU method is proposed, which can effectively handle intra-class variability and is compatible with various existing OOD detection models. 3. **Effectiveness**: Extensive experiments demonstrate that DPU significantly improves OOD detection performance across two tasks, five datasets, and nine baseline OOD methods, including up to an 80% performance improvement in far OOD detection. ### Method Overview The DPU framework mainly includes three key steps: 1. **Cohesive-Separate Contrastive Training (CSCT)**: Enhances intra-class consistency through marginal contrastive learning while maintaining inter-class distinction, ensuring that the learned representation space has both intra-class cohesion and inter-class separability. 2. **Dynamic Prototype Approximation (DPA)**: Adaptively updates class prototypes based on the similarity between samples and prototypes, reducing the impact of outliers and making each prototype more accurately represent the core features of its class. 3. **Pro-ratio Discrepancy Intensification (PDI)**: Adjusts prediction differences based on the similarity between samples and their class prototypes, enhancing ID accuracy and improving model robustness. ### Experimental Results Experimental results show that DPU significantly improves OOD detection performance across multiple datasets and baseline OOD methods, particularly excelling in far OOD detection tasks. For instance, with HMDB51 as the ID dataset, DPU significantly outperforms baseline methods in terms of FPR95 and AUROC metrics on OOD datasets such as Kinetics600, UCF101, HAC, and EPIC-Kitchens. ### Conclusion DPU effectively handles intra-class variability within ID data by dynamically adjusting multimodal prediction differences, significantly improving OOD detection performance and providing a new solution for multimodal OOD detection.

DPU: Dynamic Prototype Updating for Multimodal Out-of-Distribution Detection

MultiOOD: Scaling Out-of-Distribution Detection for Multiple Modalities

Learning with Mixture of Prototypes for Out-of-Distribution Detection

Unsupervised Out-of-Distribution Object Detection via PCA-Driven Dynamic Prototype Enhancement

General-Purpose Multi-Modal OOD Detection Framework

Improving Out-of-Distribution Detection by Combining Existing Post-hoc Methods

Classifier-head Informed Feature Masking and Prototype-based Logit Smoothing for Out-of-Distribution Detection

From Global to Local: Multi-scale Out-of-distribution Detection

Advancing Out-of-Distribution Detection through Data Purification and Dynamic Activation Function Design

Out-of-Distribution Detection Using Peer-Class Generated by Large Language Model

Towards Few-shot Out-of-Distribution Detection

Enhancing Out-of-Distribution Detection with Multitesting-based Layer-wise Feature Fusion

Diffusion Denoising Process for Perceptron Bias in Out-of-distribution Detection

TagOOD: A Novel Approach to Out-of-Distribution Detection via Vision-Language Representations and Class Center Learning

Beyond Perceptual Distances: Rethinking Disparity Assessment for Out-of-Distribution Detection with Diffusion Models

Dual-Adapter: Training-free Dual Adaptation for Few-shot Out-of-Distribution Detection

The Best of Both Worlds: On the Dilemma of Out-of-distribution Detection

Rethinking Out-of-distribution (OOD) Detection: Masked Image Modeling is All You Need

Improving Out-of-Distribution Detection with Disentangled Foreground and Background Features

MOODv2: Masked Image Modeling for Out-of-Distribution Detection