Mining In-distribution Attributes in Outliers for Out-of-distribution Detection

Yutian Lei,Luping Ji,Pei Liu
2024-12-16
Abstract:Out-of-distribution (OOD) detection is indispensable for deploying reliable machine learning systems in real-world scenarios. Recent works, using auxiliary outliers in training, have shown good potential. However, they seldom concern the intrinsic correlations between in-distribution (ID) and OOD data. In this work, we discover an obvious correlation that OOD data usually possesses significant ID attributes. These attributes should be factored into the training process, rather than blindly suppressed as in previous approaches. Based on this insight, we propose a structured multi-view-based out-of-distribution detection learning (MVOL) framework, which facilitates rational handling of the intrinsic in-distribution attributes in outliers. We provide theoretical insights on the effectiveness of MVOL for OOD detection. Extensive experiments demonstrate the superiority of our framework to others. MVOL effectively utilizes both auxiliary OOD datasets and even wild datasets with noisy in-distribution data. Code is available at <a class="link-external link-https" href="https://github.com/UESTC-nnLab/MVOL" rel="external noopener nofollow">this https URL</a>.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively detect out - of - distribution (OOD) data when deploying reliable machine - learning systems in real - world scenarios. Specifically, although existing methods based on auxiliary outliers have achieved certain success, they usually ignore the possible in - distribution (ID) attributes in OOD data, which leads to the unreasonable use of auxiliary outliers and thus affects the OOD detection performance. ### Core Problems of the Paper 1. **Limitations of Existing Methods**: - Existing outlier exposure (OE) - based methods often blindly suppress the model's response to outliers during the training process, ignoring the ID attributes that may be contained in these outliers. - This approach may lead to poor performance of the model when dealing with OOD data, especially when facing outliers with certain ID features. 2. **Research Motivation**: - The author found through experiments that OOD data usually contains significant ID attributes, which should be utilized during the training process rather than simply suppressed. - Therefore, a new framework is proposed, aiming to rationally utilize the ID attributes in OOD data to improve the performance of OOD detection. ### Solution To solve the above problems, the author proposes a structured multi - view out - of - distribution learning framework (Multi - view - based Out - of - distribution Learning, MVOL), and its main contributions are as follows: 1. **Extended Multi - view Data Model (MVDM)**: - By assuming that OOD data mainly consists of minor ID features and noise, the author extends the original MVDM, revealing the internal ID attributes in outliers. 2. **New Insights of MaxLogit as an OOD Scoring Method**: - MaxLogit is proposed as an interpretable and effective OOD scoring method, which can measure the ID attributes contained in the test input. - Outliers tend to get a lower MaxLogit score if they contain minor ID features. 3. **Multi - view - based Learning Objective**: - A new learning objective is proposed, which explicitly utilizes the internal minor ID features in auxiliary outliers to calibrate unexpectedly high logits. - Unlike traditional OE methods, this method does not treat all classes equally, but makes adaptive adjustments according to whether the class contains minor ID features. ### Experimental Verification To verify the effectiveness of the proposed method, the author conducted extensive experiments. The results show that MVOL significantly outperforms existing OOD detection methods on multiple benchmark datasets. In particular, when dealing with wild datasets containing ID noise, MVOL shows strong adaptability and robustness. ### Summary This paper proposes a new framework MVOL by deeply analyzing the internal relationship between OOD data and ID data, which effectively solves the problem of the unreasonable use of auxiliary outliers in existing methods and improves the performance of OOD detection.