Abstract:Modern machine learning models deployed often encounter distribution shifts in real-world applications, manifesting as covariate or semantic out-of-distribution (OOD) shifts. These shifts give rise to challenges in OOD generalization and OOD detection. This paper introduces a novel, integrated approach AHA (Adaptive Human-Assisted OOD learning) to simultaneously address both OOD generalization and detection through a human-assisted framework by labeling data in the wild. Our approach strategically labels examples within a novel maximum disambiguation region, where the number of semantic and covariate OOD data roughly equalizes. By labeling within this region, we can maximally disambiguate the two types of OOD data, thereby maximizing the utility of the fixed labeling budget. Our algorithm first utilizes a noisy binary search algorithm that identifies the maximal disambiguation region with high probability. The algorithm then continues with annotating inside the identified labeling region, reaping the full benefit of human feedback. Extensive experiments validate the efficacy of our framework. We observed that with only a few hundred human annotations, our method significantly outperforms existing state-of-the-art methods that do not involve human assistance, in both OOD generalization and OOD detection. Code is publicly available at \url{<a class="link-external link-https" href="https://github.com/HaoyueBaiZJU/aha" rel="external noopener nofollow">this https URL</a>}.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenges faced by modern machine - learning models when encountering distribution shifts in real - world applications. Specifically, these distribution shifts can be divided into covariate out - of - distribution (OOD) and semantic out - of - distribution (semantic OOD). Covariate OOD means that the domain and environment of the test data are different from those of the training data; while semantic OOD means that the model encounters new classes during testing. These two types of distribution shifts respectively lead to two important challenges: OOD generalization (dealing with the distribution mismatch between training and test data) and OOD detection (identifying samples from unknown classes that should not be predicted by the classifier). The paper proposes a new comprehensive method - AHA (Adaptive Human - Assisted OOD learning), which simultaneously addresses these two challenges by combining a human - assisted framework. AHA achieves this by annotating data in the wild data, especially by selecting to annotate examples within a region called the "maximum disambiguation region", where the approximate number of covariate and semantic OOD data is equal. By annotating in this region, the ability to distinguish between the two types of OOD data can be maximized under a fixed annotation budget, thereby improving the OOD generalization and detection performance of the model. The main contributions of the paper include: 1. For the first time, human assistance is utilized to simultaneously improve OOD generalization and detection, providing a natural and effective method for marking wild data with heterogeneous data shifts. 2. A novel annotation strategy is proposed, aiming at the "maximum disambiguation region", which significantly enhances OOD generalization and detection when this region is annotated. 3. Extensive experiments and ablation studies demonstrate the effectiveness of the proposed human - assisted method. AHA significantly outperforms the existing state - of - the - art methods without human assistance in terms of OOD generalization and detection with only a few hundred human annotations.

AHA: Human-Assisted Out-of-Distribution Generalization and Detection

Out-of-Distribution Learning with Human Feedback

Adaptive Label Smoothing for Out-of-Distribution Detection

OAL: Enhancing OOD Detection Using Latent Diffusion

The Best of Both Worlds: On the Dilemma of Out-of-distribution Detection

Out-of-distribution Detection Learning with Unreliable Out-of-distribution Sources

Exploiting Mixed Unlabeled Data for Detecting Samples of Seen and Unseen Out-of-Distribution Classes.

Out-of-Distribution (OOD) Detection and Generalization Improved by Augmenting Adversarial Mixup Samples

Learning to Augment Distributions for Out-of-Distribution Detection

Anomaly Detection under Distribution Shift

MADOD: Generalizing OOD Detection to Unseen Domains via G-Invariance Meta-Learning

Feed Two Birds with One Scone: Exploiting Wild Data for Both Out-of-Distribution Generalization and Detection

Mahalanobis-Aware Training for Out-of-Distribution Detection

Generalize or Detect? Towards Robust Semantic Segmentation Under Multiple Distribution Shifts

Towards Effective Semantic OOD Detection in Unseen Domains: A Domain Generalization Perspective

Recent Advances in OOD Detection: Problems and Approaches

MultiOOD: Scaling Out-of-Distribution Detection for Multiple Modalities

Going Beyond Conventional OOD Detection

Your Data Is Not Perfect: Towards Cross-Domain Out-of-Distribution Detection in Class-Imbalanced Data

Out-Of-Distribution Detection with Diversification (Provably)

Advancing Out-of-Distribution Detection through Data Purification and Dynamic Activation Function Design