Text-Tuple-Table: Towards Information Integration in Text-to-Table Generation via Global Tuple Extraction

Zheye Deng,Chunkit Chan,Weiqi Wang,Yuxi Sun,Wei Fan,Tianshi Zheng,Yauwai Yim,Yangqiu Song

2024-04-22

Abstract:The task of condensing large chunks of textual information into concise and structured tables has gained attention recently due to the emergence of Large Language Models (LLMs) and their potential benefit for downstream tasks, such as text summarization and text mining. Previous approaches often generate tables that directly replicate information from the text, limiting their applicability in broader contexts, as text-to-table generation in real-life scenarios necessitates information extraction, reasoning, and integration. However, there is a lack of both datasets and methodologies towards this task. In this paper, we introduce LiveSum, a new benchmark dataset created for generating summary tables of competitions based on real-time commentary texts. We evaluate the performances of state-of-the-art LLMs on this task in both fine-tuning and zero-shot settings, and additionally propose a novel pipeline called $T^3$(Text-Tuple-Table) to improve their performances. Extensive experimental results demonstrate that LLMs still struggle with this task even after fine-tuning, while our approach can offer substantial performance gains without explicit training. Further analyses demonstrate that our method exhibits strong generalization abilities, surpassing previous approaches on several other text-to-table datasets. Our code and data can be found at

Computation and Language

What problem does this paper attempt to address?

The problem addressed in this paper is the lack of information integration ability in the current task of text-to-table generation. Existing methods mainly rely on directly copying information from text, while in practical scenarios, information extraction, reasoning, and integration are required. The paper proposes a new dataset called LIVESUM and a Text-Tuple-Table (T3) pipeline method to improve the performance of large language models in generating summary tables, emphasizing information aggregation and complex contextual understanding. Experiments show that even after fine-tuning, LLMs still face challenges in this task, while the T3 method significantly improves performance without the need for additional training and has strong generalization ability.

Text-Tuple-Table: Towards Information Integration in Text-to-Table Generation via Global Tuple Extraction

QTSumm: Query-Focused Summarization over Tabular Data

QFMTS: Generating Query-Focused Summaries over Multi-Table Inputs

CT-Eval: Benchmarking Chinese Text-to-Table Performance in Large Language Models

Reasoning-Aware Query-Focused Summarization over Multi-Table Data

Long Text and Multi-Table Summarization: Dataset and Method

Structsum Generation for Faster Text Comprehension

Tree-of-Table: Unleashing the Power of LLMs for Enhanced Large-Scale Table Understanding

Text2Analysis: A Benchmark of Table Question Answering with Advanced Data Analysis and Unclear Queries

Exploring the Limits of ChatGPT for Query or Aspect-based Text Summarization

UniSumEval: Towards Unified, Fine-Grained, Multi-Dimensional Summarization Evaluation for LLMs

SportsSum2.0: Generating High-Quality Sports News from Live Text Commentary

TriSum: Learning Summarization Ability from Large Language Models with Structured Rationale

TGSum: Build Tweet Guided Multi-Document Summarization Dataset

Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization

Text-to-Table: A New Way of Information Extraction

Element-aware Summarization with Large Language Models: Expert-aligned Evaluation and Chain-of-Thought Method

CLTS+: A New Chinese Long Text Summarization Dataset with Abstractive Summaries

Embrace Divergence for Richer Insights: A Multi-document Summarization Benchmark and a Case Study on Summarizing Diverse Information from News Articles

CTRLsum: Towards Generic Controllable Text Summarization