Abstract:We introduce DataTales, a novel benchmark designed to assess the proficiency of language models in data narration, a task crucial for transforming complex tabular data into accessible narratives. Existing benchmarks often fall short in capturing the requisite analytical complexity for practical applications. DataTales addresses this gap by offering 4.9k financial reports paired with corresponding market data, showcasing the demand for models to create clear narratives and analyze large datasets while understanding specialized terminology in the field. Our findings highlights the significant challenge that language models face in achieving the necessary precision and analytical depth for proficient data narration, suggesting promising avenues for future model development and evaluation methodologies.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to evaluate the performance of language models in data narration tasks, especially their ability to transform complex tabular data into easily understandable narratives. Existing benchmarks usually fail to capture the analytical complexity required in practical applications, and DATATALES aims to fill this gap by providing 4,900 financial reports and their corresponding market data. ### Specific description of the problem 1. **Deficiencies of existing benchmarks**: - Existing data - to - text task benchmarks, such as RotoWire, WikiBio and ToTTo, mainly focus on basic information conversion and lack complex analytical operations. - These benchmarks fail to fully evaluate the performance of language models in generating in - depth analysis and understanding of professional terms. 2. **Challenges in data narration**: - Data narration is not just a simple information conversion. It requires in - depth analysis of data, including trend analysis, causal analysis and predictive analysis. - It is necessary to process a large amount of input data, and the model is required to have professional domain knowledge to generate accurate and meaningful narratives. 3. **Contributions of DATATALES**: - It provides a new benchmark containing 4,900 financial reports and their corresponding market data, covering a wide range of financial fields, such as stocks, national debts, currencies and commodities. - It emphasizes the importance of complex analytical operations, such as searching, comparing, subtracting, rate of change, trend analysis, causal analysis and predictive analysis. - It includes extended historical data to simulate real - world data narration challenges. ### Main objectives - **Evaluate the performance of language models**: Through the DATATALES benchmark, evaluate the performance of existing language models in data narration tasks, especially their performance in zero - shot and fine - tuning settings. - **Promote model development**: Reveal the deficiencies of current models in data narration tasks and provide guidance for the development and evaluation methods of future models. - **Improve the quality of data narration**: By introducing more complex analytical tasks, promote the ability of language models to generate high - quality data narration. ### Conclusion The introduction of the DATATALES benchmark provides a new tool for evaluating and improving the performance of language models in data narration tasks. Through the analysis of a large number of financial reports and market data, this benchmark shows the challenges faced by language models in generating accurate and in - depth data narration and points out the direction for future research.

DataTales: A Benchmark for Real-World Intelligent Data Narration

Narrative Player: Reviving Data Narratives with Visuals

Data Player: Automatic Generation of Data Videos with Narration-Animation Interplay

DataNarrative: Automated Data-Driven Storytelling with Visualizations and Texts

Data Playwright: Authoring Data Videos with Annotated Narration

Movie101v2: Improved Movie Narration Benchmark

Do Text-to-Vis Benchmarks Test Real Use of Visualisations?

DataTales: Investigating the use of Large Language Models for Authoring Data-Driven Articles

DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?

A Benchmark for Understanding and Generating Dialogue between Characters in Stories

Tell Me a Story! Narrative-Driven XAI with Large Language Models

3DBench: A Scalable 3D Benchmark and Instruction-Tuning Dataset

Text2Analysis: A Benchmark of Table Question Answering with Advanced Data Analysis and Unclear Queries

TCube: Domain-Agnostic Neural Time-series Narration

Synchronized Video Storytelling: Generating Video Narrations with Structured Storyline

FinDABench: Benchmarking Financial Data Analysis Ability of Large Language Models

Fantastic Questions and Where to Find Them: FairytaleQA -- An Authentic Dataset for Narrative Comprehension

The Stories We Tell About Data: Media Types for Data-Driven Storytelling

BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data

From Data to Story: Towards Automatic Animated Data Video Creation with LLM-based Multi-Agent Systems