DeepLens: Interactive Out-of-distribution Data Detection in NLP Models

Da Song,Zhijie Wang,Yuheng Huang,Lei Ma,Tianyi Zhang
DOI: https://doi.org/10.1145/3544548.3580741
2023-03-03
Abstract:Machine Learning (ML) has been widely used in Natural Language Processing (NLP) applications. A fundamental assumption in ML is that training data and real-world data should follow a similar distribution. However, a deployed ML model may suffer from out-of-distribution (OOD) issues due to distribution shifts in the real-world data. Though many algorithms have been proposed to detect OOD data from text corpora, there is still a lack of interactive tool support for ML developers. In this work, we propose DeepLens, an interactive system that helps users detect and explore OOD issues in massive text corpora. Users can efficiently explore different OOD types in DeepLens with the help of a text clustering method. Users can also dig into a specific text by inspecting salient words highlighted through neuron activation analysis. In a within-subjects user study with 24 participants, participants using DeepLens were able to find nearly twice more types of OOD issues accurately with 22% more confidence compared with a variant of DeepLens that has no interaction or visualization support.
Human-Computer Interaction,Machine Learning
What problem does this paper attempt to address?
The paper primarily addresses the issue of Out-of-Distribution (OOD) data detection and understanding faced by machine learning models in the field of Natural Language Processing (NLP). Specifically, the paper proposes an interactive system named DeepLens, aimed at helping users: 1. **Automatically detect** OOD data (meeting requirement N1) by dynamically identifying the proportion of OOD samples in the test data through threshold adjustment. 2. **Understand why certain data is marked as OOD** (meeting requirement N2) by comparing the differences between OOD samples and In-Distribution (ID) samples, and highlighting key terms in the text to aid understanding. 3. **Identify different types of OOD data** (meeting requirement N3) by grouping OOD data using text clustering methods and displaying keywords for each group to reveal potential themes. 4. **Compare OOD data with ID data** (meeting requirement N4) by allowing users to filter data based on predicted labels and then compare OOD and ID samples under the same label. 5. **Explore OOD issues from both global and local perspectives** (meeting requirement N5) by providing a cluster view for an overall overview and allowing users to delve into specific groups. The design of the DeepLens system is based on the Maximum Softmax Probability (MSP) calibration method to calculate OOD scores, and it incorporates various visualization tools and techniques (such as icon arrays, scatter plots, word clouds, etc.) as well as neural activation analysis algorithms to highlight important terms in the text. This helps users efficiently explore and understand OOD issues in large-scale text corpora. Additionally, the system underwent user studies, and the results showed that participants using DeepLens performed better in detecting and understanding different types of OOD issues compared to versions without interactive or visualization support.