Abstract:Tabular data analysis is crucial in various fields, and large language models show promise in this area. However, current research mostly focuses on rudimentary tasks like Text2SQL and TableQA, neglecting advanced analysis like forecasting and chart generation. To address this gap, we developed the Text2Analysis benchmark, incorporating advanced analysis tasks that go beyond the SQL-compatible operations and require more in-depth analysis. We also develop five innovative and effective annotation methods, harnessing the capabilities of large language models to enhance data quality and quantity. Additionally, we include unclear queries that resemble real-world user questions to test how well models can understand and tackle such challenges. Finally, we collect 2249 query-result pairs with 347 tables. We evaluate five state-of-the-art models using three different metrics and the results show that our benchmark presents introduces considerable challenge in the field of tabular data analysis, paving the way for more advanced research opportunities.

What problem does this paper attempt to address?

The paper aims to address two major issues in the current field of data table analysis: 1. **Lack of advanced data analysis tasks**: Existing research works, such as Text2SQL and TableQA datasets, mainly focus on basic operations of descriptive analysis (e.g., simple queries and summaries), while neglecting tasks that require deeper analytical capabilities, such as prediction, chart generation, etc. 2. **Handling unclear queries**: In practical applications, users' queries are often unclear or lack parameters, which poses challenges for automated data analysis tools. To address these issues, the paper proposes a new benchmark dataset named Text2Analysis, which includes both advanced data analysis tasks and unclear queries. Specifically, the benchmark dataset covers the following points: - **Advanced data analysis tasks**: Including basic insights (e.g., ranking, trends, etc.), prediction (forecasting future based on historical data), and chart generation (recommending and constructing charts). - **Unclear queries**: These queries lack the key information needed to perform specific tasks, requiring the model to not only understand natural language but also possess certain data analysis capabilities to recommend appropriate analysis solutions. Additionally, the paper develops five innovative and reliable annotation methods, leveraging the capabilities of large language models to improve annotation efficiency and data volume, while ensuring the quality of the dataset. The final collected dataset contains 2249 query-result pairs, involving 347 different tables. Five state-of-the-art models were evaluated using three different evaluation metrics (executable code ratio, pass rate, and regression metrics), and the results show that these models perform well in handling clear queries but face challenges with complex libraries and unclear queries. In summary, the goal of this paper is to advance research in the field of data table analysis, particularly in advanced data analysis tasks and handling unclear user queries.

Text2Analysis: A Benchmark of Table Question Answering with Advanced Data Analysis and Unclear Queries

Benchmarking Table Comprehension In The Wild

TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Exploring the Impact of Table-to-Text Methods on Augmenting LLM-based Question Answering with Domain Hybrid Data

A Survey on Table Question Answering: Recent Advances

TQA-Bench: Evaluating LLMs for Multi-Table Question Answering with Scalable Context and Symbolic Extension

TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance

HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation

Benchmarking the Text-to-SQL Capability of Large Language Models: A Comprehensive Evaluation

Bridging the Gap: Deciphering Tabular Data Using Large Language Model

A Survey on Table-and-Text HybridQA: Concepts, Methods, Challenges and Future Directions

TableQA: a Large-Scale Chinese Text-to-SQL Dataset for Table-Aware SQL Generation

SynTQA: Synergistic Table-based Question Answering via Mixture of Text-to-SQL and E2E TQA

UNITE: A Unified Benchmark for Text-to-SQL Evaluation

DocTabQA: Answering Questions from Long Documents Using Tables

MultiTabQA: Generating Tabular Answers for Multi-Table Question Answering

Towards Text-to-SQL over Aggregate Tables

MultiHiertt: Numerical Reasoning over Multi Hierarchical Tabular and Textual Data

SCITAT: A Question Answering Benchmark for Scientific Tables and Text Covering Diverse Reasoning Types

Evaluating the Data Model Robustness of Text-to-SQL Systems Based on Real User Queries

CT-Eval: Benchmarking Chinese Text-to-Table Performance in Large Language Models