AHA: Human-Assisted Out-of-Distribution Generalization and Detection

Haoyue Bai,Jifan Zhang,Robert Nowak
2024-10-10
Abstract:Modern machine learning models deployed often encounter distribution shifts in real-world applications, manifesting as covariate or semantic out-of-distribution (OOD) shifts. These shifts give rise to challenges in OOD generalization and OOD detection. This paper introduces a novel, integrated approach AHA (Adaptive Human-Assisted OOD learning) to simultaneously address both OOD generalization and detection through a human-assisted framework by labeling data in the wild. Our approach strategically labels examples within a novel maximum disambiguation region, where the number of semantic and covariate OOD data roughly equalizes. By labeling within this region, we can maximally disambiguate the two types of OOD data, thereby maximizing the utility of the fixed labeling budget. Our algorithm first utilizes a noisy binary search algorithm that identifies the maximal disambiguation region with high probability. The algorithm then continues with annotating inside the identified labeling region, reaping the full benefit of human feedback. Extensive experiments validate the efficacy of our framework. We observed that with only a few hundred human annotations, our method significantly outperforms existing state-of-the-art methods that do not involve human assistance, in both OOD generalization and OOD detection. Code is publicly available at \url{<a class="link-external link-https" href="https://github.com/HaoyueBaiZJU/aha" rel="external noopener nofollow">this https URL</a>}.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges faced by modern machine - learning models when encountering distribution shifts in real - world applications. Specifically, these distribution shifts can be divided into covariate out - of - distribution (OOD) and semantic out - of - distribution (semantic OOD). Covariate OOD means that the domain and environment of the test data are different from those of the training data; while semantic OOD means that the model encounters new classes during testing. These two types of distribution shifts respectively lead to two important challenges: OOD generalization (dealing with the distribution mismatch between training and test data) and OOD detection (identifying samples from unknown classes that should not be predicted by the classifier). The paper proposes a new comprehensive method - AHA (Adaptive Human - Assisted OOD learning), which simultaneously addresses these two challenges by combining a human - assisted framework. AHA achieves this by annotating data in the wild data, especially by selecting to annotate examples within a region called the "maximum disambiguation region", where the approximate number of covariate and semantic OOD data is equal. By annotating in this region, the ability to distinguish between the two types of OOD data can be maximized under a fixed annotation budget, thereby improving the OOD generalization and detection performance of the model. The main contributions of the paper include: 1. For the first time, human assistance is utilized to simultaneously improve OOD generalization and detection, providing a natural and effective method for marking wild data with heterogeneous data shifts. 2. A novel annotation strategy is proposed, aiming at the "maximum disambiguation region", which significantly enhances OOD generalization and detection when this region is annotated. 3. Extensive experiments and ablation studies demonstrate the effectiveness of the proposed human - assisted method. AHA significantly outperforms the existing state - of - the - art methods without human assistance in terms of OOD generalization and detection with only a few hundred human annotations.