A noisy elephant in the room: Is your out-of-distribution detector robust to label noise?

Galadrielle Humblot-Renaux,Sergio Escalera,Thomas B. Moeslund
2024-04-02
Abstract:The ability to detect unfamiliar or unexpected images is essential for safe deployment of computer vision systems. In the context of classification, the task of detecting images outside of a model's training domain is known as out-of-distribution (OOD) detection. While there has been a growing research interest in developing post-hoc OOD detection methods, there has been comparably little discussion around how these methods perform when the underlying classifier is not trained on a clean, carefully curated dataset. In this work, we take a closer look at 20 state-of-the-art OOD detection methods in the (more realistic) scenario where the labels used to train the underlying classifier are unreliable (e.g. crowd-sourced or web-scraped labels). Extensive experiments across different datasets, noise types & levels, architectures and checkpointing strategies provide insights into the effect of class label noise on OOD detection, and show that poor separation between incorrectly classified ID samples vs. OOD samples is an overlooked yet important limitation of existing methods. Code:
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The paper primarily explores the performance of existing Out-of-Distribution (OOD) detection methods when the training data for classifiers contains label noise. Specifically, the study focuses on the following aspects: 1. **Research Background and Motivation**: - OOD detection is crucial for the safe deployment of computer vision systems, especially in identifying images that do not belong to the model's training domain in image classification tasks. - Despite the growing interest in developing post-hoc OOD detection methods, there is little discussion on how these methods perform when the base classifier is not trained on clean, well-curated datasets. 2. **Research Questions**: - The paper investigates how 20 state-of-the-art OOD detection methods perform in more realistic scenarios, i.e., when the labels used to train the base classifier are unreliable. - It particularly focuses on experimental results across different datasets, types and levels of noise, architectures, and checkpoint strategies to understand the impact of label noise on OOD detection. 3. **Key Findings**: - In the presence of label noise, most existing OOD detection methods struggle to effectively distinguish between misclassified in-distribution samples and OOD samples, which is an overlooked but significant limitation. - The paper demonstrates through extensive experiments that even low levels of label noise pose challenges to many methods and identifies some methods that perform better under such conditions. - It analyzes the relationship between OOD detection performance and in-distribution classification performance, noting that this relationship becomes more complex in the presence of label noise. 4. **Contributions**: - Provides the first study on the performance of post-hoc OOD detection methods on datasets with label noise. - Re-examines the correlation between OOD detection performance and in-distribution accuracy, exploring when and why this relationship holds. - Offers suggestions for future OOD detection method evaluation and development, particularly in dealing with unreliable label settings. In summary, this paper aims to fill a gap in existing OOD detection research by examining how current methods work in the presence of label noise. Through detailed experimental analysis, the paper reveals the significant impact of label noise on OOD detection effectiveness and provides valuable insights for future OOD detection research.