Abstract:Out-of-distribution (OOD) detection is essential for ensuring the robustness of machine learning models by identifying samples that deviate from the training distribution. While traditional OOD detection has primarily focused on single-modality inputs, such as images, recent advances in multimodal models have demonstrated the potential of leveraging multiple modalities (e.g., video, optical flow, audio) to enhance detection performance. However, existing methods often overlook intra-class variability within in-distribution (ID) data, assuming that samples of the same class are perfectly cohesive and consistent. This assumption can lead to performance degradation, especially when prediction discrepancies are uniformly amplified across all samples. To address this issue, we propose Dynamic Prototype Updating (DPU), a novel plug-and-play framework for multimodal OOD detection that accounts for intra-class variations. Our method dynamically updates class center representations for each class by measuring the variance of similar samples within each batch, enabling adaptive adjustments. This approach allows us to amplify prediction discrepancies based on the updated class centers, thereby improving the model's robustness and generalization across different modalities. Extensive experiments on two tasks, five datasets, and nine base OOD algorithms demonstrate that DPU significantly improves OOD detection performance, setting a new state-of-the-art in multimodal OOD detection, with improvements of up to 80 percent in Far-OOD detection. To facilitate accessibility and reproducibility, our code is publicly available on GitHub.
What problem does this paper attempt to address?
### The Problem the Paper Attempts to Solve
This paper aims to address the issue of Out-of-Distribution (OOD) detection in multimodal data. Specifically, existing OOD detection methods mainly focus on unimodal inputs (such as images), while multimodal models, although showing potential in improving detection performance by utilizing multiple modalities (such as video, optical flow, audio), often overlook the intra-class variability within In-Distribution (ID) data. This assumption can lead to performance degradation, especially when prediction differences are indiscriminately amplified across all samples.
To tackle this challenge, the authors propose a Dynamic Prototype Updating (DPU) framework, a novel plug-and-play approach for multimodal OOD detection that can handle intra-class variability. DPU enhances the robustness and generalization ability of the model by dynamically updating the central representation of each class, adjusting according to the variance of similar samples within each batch.
### Main Contributions
1. **New Observation**: For the first time, the negative impact of intra-class variability within ID data on OOD detection is identified and discussed.
2. **Novel Model-Agnostic Framework**: A flexible DPU method is proposed, which can effectively handle intra-class variability and is compatible with various existing OOD detection models.
3. **Effectiveness**: Extensive experiments demonstrate that DPU significantly improves OOD detection performance across two tasks, five datasets, and nine baseline OOD methods, including up to an 80% performance improvement in far OOD detection.
### Method Overview
The DPU framework mainly includes three key steps:
1. **Cohesive-Separate Contrastive Training (CSCT)**: Enhances intra-class consistency through marginal contrastive learning while maintaining inter-class distinction, ensuring that the learned representation space has both intra-class cohesion and inter-class separability.
2. **Dynamic Prototype Approximation (DPA)**: Adaptively updates class prototypes based on the similarity between samples and prototypes, reducing the impact of outliers and making each prototype more accurately represent the core features of its class.
3. **Pro-ratio Discrepancy Intensification (PDI)**: Adjusts prediction differences based on the similarity between samples and their class prototypes, enhancing ID accuracy and improving model robustness.
### Experimental Results
Experimental results show that DPU significantly improves OOD detection performance across multiple datasets and baseline OOD methods, particularly excelling in far OOD detection tasks. For instance, with HMDB51 as the ID dataset, DPU significantly outperforms baseline methods in terms of FPR95 and AUROC metrics on OOD datasets such as Kinetics600, UCF101, HAC, and EPIC-Kitchens.
### Conclusion
DPU effectively handles intra-class variability within ID data by dynamically adjusting multimodal prediction differences, significantly improving OOD detection performance and providing a new solution for multimodal OOD detection.