Abstract:We study a new problem setting of question answering (QA), referred to as DocTabQA. Within this setting, given a long document, the goal is to respond to questions by organizing the answers into structured tables derived directly from the document's content. Unlike traditional QA approaches which predominantly rely on unstructured text to formulate responses, DocTabQA aims to leverage structured tables as answers to convey information clearly and systematically, thereby enhancing user comprehension and highlighting relationships between data points. To the best of our knowledge, this problem has not been previously explored. In this paper, we introduce the QTabA dataset, encompassing 300 financial documents, accompanied by manually annotated 1.5k question-table pairs. Initially, we leverage Large Language Models (LLMs) such as GPT-4 to establish a baseline. However, it is widely acknowledged that LLMs encounter difficulties when tasked with generating intricate, structured outputs from long input sequences. To overcome these challenges, we present a two-stage framework, called DocTabTalk, which initially retrieves relevant sentences from extensive documents and subsequently generates hierarchical tables based on these identified sentences. DocTabTalk incorporates two key technological innovations: AlignLLaMA and TabTalk, which are specifically tailored to assist GPT-4 in tackling DocTabQA, enabling it to generate well-structured, hierarchical tables with improved organization and clarity. Comprehensive experimental evaluations conducted on both QTabA and RotoWire datasets demonstrate that our DocTabTalk significantly enhances the performances of the GPT-4 in our proposed DocTabQA task and the table generation task. The code and dataset are available at <a class="link-external link-https" href="https://github.com/SmileWHC/DocTabQA" rel="external noopener nofollow">this https URL</a> for further research.

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper proposes a new question answering (QA) task setting called **DocTabQA**. In this task, given a long document, the goal is to respond to questions by organizing the answers into structured tables derived directly from the document's content. Unlike traditional QA methods that primarily rely on unstructured text to generate answers, DocTabQA aims to use structured tables as answers to convey information in a clear and systematic manner, thereby enhancing user understanding and highlighting relationships between data points. Specifically, the paper attempts to address the following issues: 1. **Challenges in Extracting and Presenting Information from Long Documents**: Existing QA systems often struggle to effectively extract and present answers when dealing with long and dense documents. Especially in the context of increasingly large and complex data, QA systems need not only to understand the content of the document but also to organize the extracted information in a user-friendly and informative way. 2. **Difficulty in Generating Structured Outputs**: Large language models (LLMs) like GPT-4 face difficulties in generating complex, structured outputs. These models often struggle to consistently present structured data that meets the requirements when handling long input sequences. 3. **Limitations of Existing QA Systems**: Although QA systems have diversified in terms of content input, including short text snippets, long documents, plain text data, images (VQA), charts (ChartQA), document images (DocVQA), and videos (VideoQA), their output format remains primarily unstructured text. This traditional approach often overlooks the inherent structure of information and the relationships between data points, which can affect the user's thorough understanding of the context and ability to make informed decisions. To address these issues, the paper introduces a new dataset **QTabA** and a two-stage framework **DocTabTalk**. QTabA contains 300 financial documents and 1.5k manually annotated question-table pairs to support research in this new QA paradigm. DocTabTalk combines two key technological innovations: **AlignLLaMA** and **TabTalk**, aimed at helping GPT-4 better perform DocTabQA tasks and table generation tasks. Through these methods, the paper significantly improves GPT-4's performance in DocTabQA tasks and table generation tasks.

DocTabQA: Answering Questions from Long Documents Using Tables

MultiTabQA: Generating Tabular Answers for Multi-Table Question Answering

HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation

Localize, Retrieve and Fuse: A Generalized Framework for Free-Form Question Answering over Tables

How Robust are the Tabular QA Models for Scientific Tables? A Study using Customized Dataset

AIT-QA: Question Answering Dataset over Complex Tables in the Airline Industry

A Survey on Table Question Answering: Recent Advances

Exploring the Impact of Table-to-Text Methods on Augmenting LLM-based Question Answering with Domain Hybrid Data

BioTABQA: Instruction Learning for Biomedical Table Question Answering

Text2Analysis: A Benchmark of Table Question Answering with Advanced Data Analysis and Unclear Queries

MFORT-QA: Multi-hop Few-shot Open Rich Table Question Answering

TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance

Accurate and Regret-aware Numerical Problem Solver for Tabular Question Answering

TQA-Bench: Evaluating LLMs for Multi-Table Question Answering with Scalable Context and Symbolic Extension

Evaluation of Table Representations to Answer Questions from Tables in Documents : A Case Study using 3GPP Specifications

ReAcTable: Enhancing ReAct for Table Question Answering

CRT-QA: A Dataset of Complex Reasoning Question Answering over Tabular Data

Detect, Retrieve, Comprehend: A Flexible Framework for Zero-Shot Document-Level Question Answering

TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

KET-QA: A Dataset for Knowledge Enhanced Table Question Answering

PDFTriage: Question Answering over Long, Structured Documents