BioTABQA: Instruction Learning for Biomedical Table Question Answering

Man Luo,Sharad Saxena,Swaroop Mishra,Mihir Parmar,Chitta Baral

DOI: https://doi.org/10.48550/arXiv.2207.02419

2022-07-06

Abstract:Table Question Answering (TQA) is an important but under-explored task. Most of the existing QA datasets are in unstructured text format and only few of them use tables as the context. To the best of our knowledge, none of TQA datasets exist in the biomedical domain where tables are frequently used to present information. In this paper, we first curate a table question answering dataset, BioTABQA, using 22 templates and the context from a biomedical textbook on differential diagnosis. BioTABQA can not only be used to teach a model how to answer questions from tables but also evaluate how a model generalizes to unseen questions, an important scenario for biomedical applications. To achieve the generalization evaluation, we divide the templates into 17 training and 5 cross-task evaluations. Then, we develop two baselines using single and multi-tasks learning on BioTABQA. Furthermore, we explore instructional learning, a recent technique showing impressive generalizing performance. Experimental results show that our instruction-tuned model outperforms single and multi-task baselines on an average by ~23% and ~6% across various evaluation settings, and more importantly, instruction-tuned model outperforms baselines by ~5% on cross-tasks.

Computation and Language,Artificial Intelligence,Machine Learning

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the deficiencies in the Table Question Answering (TQA) tasks in the biomedical field. Specifically: 1. **Lack of TQA datasets in the biomedical field**: Most existing question - answering datasets are mainly based on unstructured texts, and only a few use tables as context. In the biomedical field, information is usually presented in tabular form, but previously there was no TQA dataset specifically for this field. 2. **Insufficient generalization ability of models**: Many language models perform excellently on popular benchmarks, but in practical applications, especially in the biomedical field, their generalization ability is still limited. Therefore, it is necessary to evaluate and improve the generalization ability of models for unseen question types. To solve these problems, the author has taken the following measures: - **Create the BioTabQA dataset**: By using 22 templates to extract information from a biomedical textbook on differential diagnosis, a new table - question - answering dataset BioTabQA was constructed. This dataset is not only used to train models to answer questions in tables, but also to evaluate the generalization ability of models on unseen tasks. - **Explore Instruction Learning**: Instruction learning techniques were introduced to improve the generalization performance of models in cross - task settings. The experimental results show that the instruction - tuned models are significantly better than single - task and multi - task baseline models in various evaluation settings, especially performing better in cross - task evaluations. In summary, this paper aims to fill the gaps in TQA research in the biomedical field and improve the generalization ability of models through instruction learning, so as to better cope with the challenges in practical application scenarios.

BioTABQA: Instruction Learning for Biomedical Table Question Answering

MultiTabQA: Generating Tabular Answers for Multi-Table Question Answering

Localize, Retrieve and Fuse: A Generalized Framework for Free-Form Question Answering over Tables

TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance

DocTabQA: Answering Questions from Long Documents Using Tables

PubMedQA: A Dataset for Biomedical Research Question Answering

Table Question Answering for Low-resourced Indic Languages

Biomedical Question Answering: A Survey of Approaches and Challenges

A Survey on Table Question Answering: Recent Advances

KET-QA: A Dataset for Knowledge Enhanced Table Question Answering

TableVQA-Bench: A Visual Question Answering Benchmark on Multiple Table Domains

ReAcTable: Enhancing ReAct for Table Question Answering

How Robust are the Tabular QA Models for Scientific Tables? A Study using Customized Dataset

HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation

CRT-QA: A Dataset of Complex Reasoning Question Answering over Tabular Data

AIT-QA: Question Answering Dataset over Complex Tables in the Airline Industry

TQA-Bench: Evaluating LLMs for Multi-Table Question Answering with Scalable Context and Symbolic Extension

Developing ChatGPT for Biology and Medicine: A Complete Review of Biomedical Question Answering

SynTQA: Synergistic Table-based Question Answering via Mixture of Text-to-SQL and E2E TQA

Multi-task Question Generation Based Data Augmentation for Biomedical Answer Generation.

Unsupervised Pre-training for Biomedical Question Answering