Exploring the Impact of Table-to-Text Methods on Augmenting LLM-based Question Answering with Domain Hybrid Data

Dehai Min,Nan Hu,Rihui Jin,Nuo Lin,Jiaoyan Chen,Yongrui Chen,Yu Li,Guilin Qi,Yun Li,Nijun Li,Qianren Wang

2024-04-09

Abstract:Augmenting Large Language Models (LLMs) for Question Answering (QA) with domain specific data has attracted wide attention. However, domain data often exists in a hybrid format, including text and semi-structured tables, posing challenges for the seamless integration of information. Table-to-Text Generation is a promising solution by facilitating the transformation of hybrid data into a uniformly text-formatted corpus. Although this technique has been widely studied by the NLP community, there is currently no comparative analysis on how corpora generated by different table-to-text methods affect the performance of QA systems. In this paper, we address this research gap in two steps. First, we innovatively integrate table-to-text generation into the framework of enhancing LLM-based QA systems with domain hybrid data. Then, we utilize this framework in real-world industrial data to conduct extensive experiments on two types of QA systems (DSFT and RAG frameworks) with four representative methods: Markdown format, Template serialization, TPLM-based method, and LLM-based method. Based on the experimental results, we draw some empirical findings and explore the underlying reasons behind the success of some methods. We hope the findings of this work will provide a valuable reference for the academic and industrial communities in developing robust QA systems.

Computer Science

What problem does this paper attempt to address?

This paper aims to address the issue of how to enhance the performance of large language models (LLMs) in domain-specific question-answering systems using different table-to-text methods. Specifically, the paper focuses on the impact of corpora generated by different table-to-text methods on the performance of question-answering systems when dealing with mixed data containing text and semi-structured tables. Currently, although table-to-text generation techniques have been widely studied, there is a lack of comparative analysis on how corpora generated by different methods affect the performance of domain-specific question-answering systems. Therefore, this study fills this research gap by innovatively integrating table-to-text generation techniques into a framework to enhance LLMs and conducting extensive experiments on real industrial data. The study evaluates the performance of four representative table-to-text methods (Markdown format, template serialization, TPLM-based methods, and LLM-based methods) on two types of question-answering systems (DSFT and RAG frameworks). Through experimental results, the authors derived several empirical findings, explored the reasons behind the success of certain methods, and hope that these findings can provide valuable references for academia and industry in developing robust question-answering systems.

Exploring the Impact of Table-to-Text Methods on Augmenting LLM-based Question Answering with Domain Hybrid Data

Text2Analysis: A Benchmark of Table Question Answering with Advanced Data Analysis and Unclear Queries

TQA-Bench: Evaluating LLMs for Multi-Table Question Answering with Scalable Context and Symbolic Extension

A Survey on Table Question Answering: Recent Advances

SynTQA: Synergistic Table-based Question Answering via Mixture of Text-to-SQL and E2E TQA

QDA-SQL: Questions Enhanced Dialogue Augmentation for Multi-Turn Text-to-SQL

DocTabQA: Answering Questions from Long Documents Using Tables

TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

S$^3$HQA: A Three-Stage Approach for Multi-hop Text-Table Hybrid Question Answering

Revolutionizing Database Q&A with Large Language Models: Comprehensive Benchmark and Evaluation

Enhancing Open-Domain Table Question Answering via Syntax- and Structure-aware Dense Retrieval

Empower Large Language Model to Perform Better on Industrial Domain-Specific Question Answering

Dual Reader-Parser on Hybrid Textual and Tabular Evidence for Open Domain Question Answering

AI-TA: Towards an Intelligent Question-Answer Teaching Assistant using Open-Source LLMs

Localize, Retrieve and Fuse: A Generalized Framework for Free-Form Question Answering over Tables

On the Robustness of Language Models for Tabular Question Answering

Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study

MFORT-QA: Multi-hop Few-shot Open Rich Table Question Answering

TableVQA-Bench: A Visual Question Answering Benchmark on Multiple Table Domains

Benchmarking the Text-to-SQL Capability of Large Language Models: A Comprehensive Evaluation

PROXYQA: An Alternative Framework for Evaluating Long-Form Text Generation with Large Language Models