A Survey on Out-of-Distribution Detection in NLP

Hao Lang,Yinhe Zheng,Yixuan Li,Jian Sun,Fei Huang,Yongbin Li
2023-12-27
Abstract:Out-of-distribution (OOD) detection is essential for the reliable and safe deployment of machine learning systems in the real world. Great progress has been made over the past years. This paper presents the first review of recent advances in OOD detection with a particular focus on natural language processing approaches. First, we provide a formal definition of OOD detection and discuss several related fields. We then categorize recent algorithms into three classes according to the data they used: (1) OOD data available, (2) OOD data unavailable + in-distribution (ID) label available, and (3) OOD data unavailable + ID label unavailable. Third, we introduce datasets, applications, and metrics. Finally, we summarize existing work and present potential future research topics.
Computation and Language,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The main focus of this paper is **Out-of-Distribution (OOD) detection** in the field of Natural Language Processing (NLP). Specifically, the goals of the paper are: 1. **Definition and Classification**: Provide a formal definition of OOD detection and distinguish it from other related fields such as domain generalization, domain adaptation, and zero-shot learning. 2. **Review Progress**: Review the progress made in OOD detection in recent years, with a particular focus on NLP methods. 3. **Propose a Classification System**: Classify existing detection methods based on the availability of OOD data into three categories: - Situations where OOD data is available. - Situations where OOD data is not available but In-Distribution (ID) labels are available. - Situations where neither OOD data nor ID labels are available. 4. **Introduce Datasets, Applications, and Evaluation Metrics**: Discuss the datasets, application scenarios, and commonly used performance evaluation metrics for OOD detection. 5. **Summary and Outlook**: Summarize the existing work and propose future research directions. This study aims to fill the gap in the NLP field regarding OOD detection reviews, emphasizing the unique challenges of NLP such as discrete input space, complex output structures, and the handling of contextual information. Additionally, the paper discusses the advantages and disadvantages of different methods and proposes a novel classification system to guide future research work.