What Is Wrong with My Model? Identifying Systematic Problems with Semantic Data Slicing

Chenyang Yang,Yining Hong,Grace A. Lewis,Tongshuang Wu,Christian Kästner
2024-09-14
Abstract:Machine learning models make mistakes, yet sometimes it is difficult to identify the systematic problems behind the mistakes. Practitioners engage in various activities, including error analysis, testing, auditing, and red-teaming, to form hypotheses of what can go (or has gone) wrong with their models. To validate these hypotheses, practitioners employ data slicing to identify relevant examples. However, traditional data slicing is limited by available features and programmatic slicing functions. In this work, we propose SemSlicer, a framework that supports semantic data slicing, which identifies a semantically coherent slice, without the need for existing features. SemSlicer uses Large Language Models to annotate datasets and generate slices from any user-defined slicing criteria. We show that SemSlicer generates accurate slices with low cost, allows flexible trade-offs between different design dimensions, reliably identifies under-performing data slices, and helps practitioners identify useful data slices that reflect systematic problems.
Software Engineering,Artificial Intelligence,Computation and Language,Machine Learning
What problem does this paper attempt to address?
This paper attempts to solve the problem that systematic problems in machine - learning models are difficult to identify. Specifically, although machine - learning models may make mistakes, it is difficult to find the hidden systematic problems behind these mistakes. These problems may include poor performance on certain under - represented subgroups, showing unexpected biases or generating harmful content, etc. Once these problems are integrated into software products, they may lead to project failures, media controversies or even lawsuits. To identify and verify these systematic problems, researchers have proposed the **SemSlicer** framework, which is a tool that supports semantic data slicing. Traditional data - slicing methods rely on existing features and programmatic slicing functions, which limit their applicability and flexibility. SemSlicer, on the other hand, uses large - language models (LLMs) to label datasets and generate slices according to user - defined slicing criteria, thus being able to identify semantically coherent data subsets without relying on existing features. ### Main contributions 1. **Comprehensive view of the data - slicing landscape**: The paper provides a comprehensive overview of data slicing in machine - learning engineering. 2. **Highly configurable framework**: SemSlicer supports semantic data slicing in multiple usage scenarios, allowing users to make flexible trade - offs between different design dimensions. 3. **Extensive evaluation**: Through evaluation, it has been proven that SemSlicer can generate accurate slicing functions, allowing for flexible cost - accuracy trade - offs, and is very useful for model evaluation. ### How SemSlicer works The workflow of SemSlicer is divided into two stages: 1. **Prompt construction stage**: - The user specifies a slicing criterion (such as a keyword or description) and provides the dataset to be sliced. - SemSlicer will construct and optimize classification instructions according to the slicing criterion, and human intervention can be made if necessary. - Then, SemSlicer will sample and label a small number of examples (few - shot examples), and generate synthetic examples if needed. 2. **Data - slicing stage**: - Use the constructed prompt to label the entire dataset and generate the corresponding slices. ### Design dimensions SemSlicer's design takes into account four dimensions: - **Slicing accuracy**: Higher accuracy makes the observed slices more reliable. - **Latency**: Depending on the requirements of downstream tasks, the user can make trade - offs between latency and other dimensions. - **Human effort**: Depending on the usage scenario, SemSlicer can adjust the required human effort. - **Computational resources**: The actually available computational resources determine the performance and scale of the system. ### Conclusion By leveraging the capabilities of large - language models, SemSlicer overcomes the limitations of traditional data - slicing methods and provides a new solution for identifying and verifying systematic problems in machine - learning models. This not only improves the flexibility and accuracy of data slicing, but also provides strong support for tasks such as model debugging, fine - grained model evaluation, and continuous model monitoring.