Abstract:Large language models (LLMs) are rapidly replacing help forums like StackOverflow, and are especially helpful for non-professional programmers and end users. These users are often interested in data-centric tasks, such as spreadsheet manipulation and data wrangling, which are hard to solve if the intent is only communicated using a natural-language description, without including the data. But how do we decide how much data and which data to include in the prompt? This paper makes two contributions towards answering this question. First, we create a dataset of real-world NL-to-code tasks manipulating tabular data, mined from StackOverflow posts. Second, we introduce a cluster-then-select prompting technique, which adds the most representative rows from the input data to the LLM prompt. Our experiments show that LLM performance is indeed sensitive to the amount of data passed in the prompt, and that for tasks with a lot of syntactic variation in the input table, our cluster-then-select technique outperforms a random selection baseline.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to effectively utilize large - language models (LLMs) to complete data - centered tasks, especially when the structure and content of the input data are crucial for task completion. Specifically, the paper focuses on the following issues: 1. **How to decide how much data and which data to include in the prompt**: For many data - processing tasks, it is not enough to simply describe the intention in natural language. Example data also need to be provided to help the model understand the specific requirements of the task. However, providing too much data may lead to performance degradation or cost increase, so a balance needs to be found. 2. **How to select representative rows from a large data set**: Data sets in practical applications are often very large, and it is impossible to pass the entire data set to the model. Therefore, an effective method is required to select a small number of rows that can represent the characteristics of the entire data set. To solve these problems, the paper makes the following contributions: - **Created a new data set SOFSET**: This data set contains real - world NL - to - code tasks from StackOverflow, especially those involving tabular data manipulation tasks. - **Proposed a new cluster - then - select prompt technique**: This technique first clusters data rows according to the syntactic structure of the input data, and then selects the most representative rows from each cluster to add to the prompt. Experiments show that this method is superior to the random selection baseline when dealing with tasks with a large number of syntactic variations. - **Analyzed the sensitivity of LLM to the amount, selection, and position of data in the prompt**: Through a series of experiments, the influence of different amounts and types of input data on the model performance was studied, and the importance of data and its crucial role in the quality of task completion were demonstrated. These contributions help to improve the performance of large - language models in data - centered tasks, especially when dealing with complex multi - step calculations and data manipulations.

Solving Data-centric Tasks using Large Language Models

Knowing When to Ask -- Bridging Large Language Models and Data

Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study

To prompt or not to prompt: Navigating the use of large language models for integrating and modeling heterogeneous data

Supporting Sensemaking of Large Language Model Outputs at Scale

NLPBench: Evaluating Large Language Models on Solving NLP Problems

LLM4DS: Evaluating Large Language Models for Data Science Code Generation

Large Language Models are Good Multi-lingual Learners : When LLMs Meet Cross-lingual Prompts

Reinforcement Learning Problem Solving with Large Language Models

Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation

Automated Data Visualization from Natural Language via Large Language Models: An Exploratory Study

Automatic Prompt Selection for Large Language Models

Scaling Data-Driven Building Energy Modelling using Large Language Models

Spoken Language Intelligence of Large Language Models for Language Learning

Large Language Model for Table Processing: A Survey

Large Language Models in Healthcare: A Comprehensive Benchmark

Unleashing the Potential of Large Language Models for Predictive Tabular Tasks in Data Science

Large Language Model Prompting Techniques for Advancement in Clinical Medicine

Strategic Prompting for Conversational Tasks: A Comparative Analysis of Large Language Models Across Diverse Conversational Tasks

PromptAid: Prompt Exploration, Perturbation, Testing and Iteration using Visual Analytics for Large Language Models

Can Large Language Models Provide Emergency Medical Help Where There Is No Ambulance? A Comparative Study on Large Language Model Understanding of Emergency Medical Scenarios in Resource-Constrained Settings