S$^3$HQA: A Three-Stage Approach for Multi-hop Text-Table Hybrid Question Answering

Fangyu Lei,Xiang Li,Yifan Wei,Shizhu He,Yiming Huang,Jun Zhao,Kang Liu

2024-06-25

Abstract:Answering multi-hop questions over hybrid factual knowledge from the given text and table (TextTableQA) is a challenging task. Existing models mainly adopt a retriever-reader framework, which have several deficiencies, such as noisy labeling in training retriever, insufficient utilization of heterogeneous information over text and table, and deficient ability for different reasoning operations. In this paper, we propose a three-stage TextTableQA framework S3HQA, which comprises of retriever, selector, and reasoner. We use a retriever with refinement training to solve the noisy labeling problem. Then, a hybrid selector considers the linked relationships between heterogeneous data to select the most relevant factual knowledge. For the final stage, instead of adapting a reading comprehension module like in previous methods, we employ a generation-based reasoner to obtain answers. This includes two approaches: a row-wise generator and an LLM prompting generator~(first time used in this task). The experimental results demonstrate that our method achieves competitive results in the few-shot setting. When trained on the full dataset, our approach outperforms all baseline methods, ranking first on the HybridQA leaderboard.

Computation and Language

What problem does this paper attempt to address?

The paper aims to address several key issues in the multi-stage Text-Table Hybrid Question Answering (TextTableQA) task: 1. **Noise Annotation Issue**: Existing methods often ignore weakly supervised answer annotations when training retrievers, leading to the introduction of a large amount of noise through the automatic labeling of pseudo-gold evidence. 2. **Insufficient Information Utilization**: Existing methods select specific cells or paragraphs for reading to extract the final answer after retrieval, failing to fully utilize heterogeneous information such as table structure and hyperlinks between cells and paragraphs, which are crucial for solving multi-hop questions. 3. **Insufficient Reasoning Ability**: Previous methods mainly rely on extraction modules to obtain answers, which cannot support knowledge reasoning operations such as comparison and calculation. To address these issues, the authors propose a three-stage framework S3HQA, which includes a retriever with refined training, a hybrid selector, and a generation-based reasoner. Experimental results show that this method performs excellently in the HybridQA benchmark, surpassing existing baseline methods.

S$^3$HQA: A Three-Stage Approach for Multi-hop Text-Table Hybrid Question Answering

S2M: Converting Single-Turn to Multi-Turn Datasets for Conversational Question Answering

HRoT: Hybrid prompt strategy and Retrieval of Thought for Table-Text Hybrid Question Answering

TTQA-RS- A break-down prompting approach for Multi-hop Table-Text Question Answering with Reasoning and Summarization

From Easy to Hard: Two-stage Selector and Reader for Multi-hop Question Answering.

SynTQA: Synergistic Table-based Question Answering via Mixture of Text-to-SQL and E2E TQA

TACR: A Table-alignment-based Cell-selection and Reasoning Model for Hybrid Question-Answering

MFORT-QA: Multi-hop Few-shot Open Rich Table Question Answering

Localize, Retrieve and Fuse: A Generalized Framework for Free-Form Question Answering over Tables

Hierarchical Retrieval-Augmented Generation Model with Rethink for Multi-hop Question Answering

Hybrid Question Answering over Knowledge Base and Free Text.

TACR: A Table Alignment-based Cell Selection Method for HybridQA.

A Hybrid Text Generation-Based Query Expansion Method for Open-Domain Question Answering

A Neural Question Answering Model Based on Semi-Structured Tables

CT2C-QA: Multimodal Question Answering over Chinese Text, Table and Chart

Dual Reader-Parser on Hybrid Textual and Tabular Evidence for Open Domain Question Answering

TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance

Exploring the Impact of Table-to-Text Methods on Augmenting LLM-based Question Answering with Domain Hybrid Data

HeteroQA: Learning towards Question-and-Answering through Multiple Information Sources via Heterogeneous Graph Modeling

Ask to Understand: Question Generation for Multi-hop Question Answering

Multi-hop Question Answering for SRLGRN Augmented by Textual Relationship Modelling