Abstract:Machine learning models make mistakes, yet sometimes it is difficult to identify the systematic problems behind the mistakes. Practitioners engage in various activities, including error analysis, testing, auditing, and red-teaming, to form hypotheses of what can go (or has gone) wrong with their models. To validate these hypotheses, practitioners employ data slicing to identify relevant examples. However, traditional data slicing is limited by available features and programmatic slicing functions. In this work, we propose SemSlicer, a framework that supports semantic data slicing, which identifies a semantically coherent slice, without the need for existing features. SemSlicer uses Large Language Models to annotate datasets and generate slices from any user-defined slicing criteria. We show that SemSlicer generates accurate slices with low cost, allows flexible trade-offs between different design dimensions, reliably identifies under-performing data slices, and helps practitioners identify useful data slices that reflect systematic problems.

What problem does this paper attempt to address?

This paper attempts to solve the problem that systematic problems in machine - learning models are difficult to identify. Specifically, although machine - learning models may make mistakes, it is difficult to find the hidden systematic problems behind these mistakes. These problems may include poor performance on certain under - represented subgroups, showing unexpected biases or generating harmful content, etc. Once these problems are integrated into software products, they may lead to project failures, media controversies or even lawsuits. To identify and verify these systematic problems, researchers have proposed the **SemSlicer** framework, which is a tool that supports semantic data slicing. Traditional data - slicing methods rely on existing features and programmatic slicing functions, which limit their applicability and flexibility. SemSlicer, on the other hand, uses large - language models (LLMs) to label datasets and generate slices according to user - defined slicing criteria, thus being able to identify semantically coherent data subsets without relying on existing features. ### Main contributions 1. **Comprehensive view of the data - slicing landscape**: The paper provides a comprehensive overview of data slicing in machine - learning engineering. 2. **Highly configurable framework**: SemSlicer supports semantic data slicing in multiple usage scenarios, allowing users to make flexible trade - offs between different design dimensions. 3. **Extensive evaluation**: Through evaluation, it has been proven that SemSlicer can generate accurate slicing functions, allowing for flexible cost - accuracy trade - offs, and is very useful for model evaluation. ### How SemSlicer works The workflow of SemSlicer is divided into two stages: 1. **Prompt construction stage**: - The user specifies a slicing criterion (such as a keyword or description) and provides the dataset to be sliced. - SemSlicer will construct and optimize classification instructions according to the slicing criterion, and human intervention can be made if necessary. - Then, SemSlicer will sample and label a small number of examples (few - shot examples), and generate synthetic examples if needed. 2. **Data - slicing stage**: - Use the constructed prompt to label the entire dataset and generate the corresponding slices. ### Design dimensions SemSlicer's design takes into account four dimensions: - **Slicing accuracy**: Higher accuracy makes the observed slices more reliable. - **Latency**: Depending on the requirements of downstream tasks, the user can make trade - offs between latency and other dimensions. - **Human effort**: Depending on the usage scenario, SemSlicer can adjust the required human effort. - **Computational resources**: The actually available computational resources determine the performance and scale of the system. ### Conclusion By leveraging the capabilities of large - language models, SemSlicer overcomes the limitations of traditional data - slicing methods and provides a new solution for identifying and verifying systematic problems in machine - learning models. This not only improves the flexibility and accuracy of data slicing, but also provides strong support for tasks such as model debugging, fine - grained model evaluation, and continuous model monitoring.

What Is Wrong with My Model? Identifying Systematic Problems with Semantic Data Slicing

Where Does My Model Underperform? A Human Evaluation of Slice Discovery Algorithms

Troubleshooting image segmentation models with human-in-the-loop

On How Data Are Partitioned in Model Development and Evaluation: Confronting the Elephant in the Room to Enhance Model Generalization.

Semantic-based Interactive Shape Analysis and Manipulation

NeuSemSlice: Towards Effective DNN Model Maintenance via Neuron-level Semantic Slicing

Discover, Explanation, Improvement: An Automatic Slice Detection Framework for Natural Language Processing

AttributionScanner: A Visual Analytics System for Model Validation with Metadata-Free Slice Finding

LADDER: Language Driven Slice Discovery and Error Rectification

Interactive slice visualization for exploring machine learning models

Model Slicing for Supporting Complex Analytics with Elastic Inference Cost and Resource Constraints

Program Slicing in the Era of Large Language Models

Learning Semantic Embedding Spaces for Slicing Vegetables

Automatic Discovery and Assessment of Interpretable Systematic Errors in Semantic Segmentation

A Data Slicing Method to Improve Machine Learning Model Accuracy in Bankruptcy Prediction

Machine Learning Model Drift Detection Via Weak Data Slices

VLSlice: Interactive Vision-and-Language Slice Discovery

FreaAI: Automated extraction of data slices to test machine learning models

Requirements-driven Slicing of Simulink Models Using LLMs

Program Slicing under UML Scenario Models.

Slicing Through Bias: Explaining Performance Gaps in Medical Image Analysis using Slice Discovery Methods