Localize, Retrieve and Fuse: A Generalized Framework for Free-Form Question Answering over Tables

Wenting Zhao,Ye Liu,Yao Wan,Yibo Wang,Zhongfen Deng,Philip S. Yu
2023-09-21
Abstract:Question answering on tabular data (a.k.a TableQA), which aims at generating answers to questions grounded on a provided table, has gained significant attention recently. Prior work primarily produces concise factual responses through information extraction from individual or limited table cells, lacking the ability to reason across diverse table cells. Yet, the realm of free-form TableQA, which demands intricate strategies for selecting relevant table cells and the sophisticated integration and inference of discrete data fragments, remains mostly unexplored. To this end, this paper proposes a generalized three-stage approach: Table-to- Graph conversion and cell localizing, external knowledge retrieval, and the fusion of table and text (called TAG-QA), to address the challenge of inferring long free-form answers in generative TableQA. In particular, TAG-QA (1) locates relevant table cells using a graph neural network to gather intersecting cells between relevant rows and columns, (2) leverages external knowledge from Wikipedia, and (3) generates answers by integrating both tabular data and natural linguistic information. Experiments showcase the superior capabilities of TAG-QA in generating sentences that are both faithful and coherent, particularly when compared to several state-of-the-art baselines. Notably, TAG-QA surpasses the robust pipeline-based baseline TAPAS by 17% and 14% in terms of BLEU-4 and PARENT F-score, respectively. Furthermore, TAG-QA outperforms the end-to-end model T5 by 16% and 12% on BLEU-4 and PARENT F-score, respectively.
Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is Free - form TableQA on tabular data, that is, how to generate long and coherent answers based on the provided tables. Most of the existing research mainly focuses on extracting information from individual or limited table cells to generate short factual answers, lacking the ability to reason across multiple cells. However, free - form TableQA requires more complex strategies to select relevant cells and integrate and infer discrete data fragments, which has been largely unexplored. To this end, the paper proposes a general three - stage method: table - to - graph conversion and cell location, external knowledge retrieval, and table - and - text fusion (called TAG - QA) to address the challenges of generating long free - form answers. Specifically, TAG - QA achieves this goal through the following steps: 1. **Table - to - graph conversion and cell location**: Use Graph Neural Networks (GNN) to locate relevant cells and collect the intersection cells between relevant rows and columns. 2. **External knowledge retrieval**: Utilize external knowledge from Wikipedia. 3. **Table - and - text fusion**: Generate answers by integrating tabular data and natural language information. The experimental results show that TAG - QA performs well in generating both faithful and coherent sentences, especially when compared with several state - of - the - art baseline models. For example, TAG - QA outperforms the powerful pipeline baseline model TAPAS by 17% and 14% on the BLEU - 4 and PARENT F - score metrics respectively, and also outperforms the end - to - end model T5 by 16% and 12% on the same metrics. Through this method, the paper not only addresses the shortcomings of existing methods in the free - form TableQA task, but also provides a new framework for generating high - quality free - form answers.