SEMQA: Semi-Extractive Multi-Source Question Answering

Tal Schuster,Adam D. Lelkes,Haitian Sun,Jai Gupta,Jonathan Berant,William W. Cohen,Donald Metzler

2024-07-01

Abstract:Recently proposed long-form question answering (QA) systems, supported by large language models (LLMs), have shown promising capabilities. Yet, attributing and verifying their generated abstractive answers can be difficult, and automatically evaluating their accuracy remains an ongoing challenge. In this work, we introduce a new QA task for answering multi-answer questions by summarizing multiple diverse sources in a semi-extractive fashion. Specifically, Semi-extractive Multi-source QA (SEMQA) requires models to output a comprehensive answer, while mixing factual quoted spans -- copied verbatim from given input sources -- and non-factual free-text connectors that glue these spans together into a single cohesive passage. This setting bridges the gap between the outputs of well-grounded but constrained extractive QA systems and more fluent but harder to attribute fully abstractive answers. Particularly, it enables a new mode for language models that leverages their advanced language generation capabilities, while also producing fine in-line attributions by-design that are easy to verify, interpret, and evaluate. To study this task, we create the first dataset of this kind, QuoteSum, with human-written semi-extractive answers to natural and generated questions, and define text-based evaluation metrics. Experimenting with several LLMs in various settings, we find this task to be surprisingly challenging, demonstrating the importance of QuoteSum for developing and studying such consolidation capabilities.

Computation and Language,Artificial Intelligence,Machine Learning

What problem does this paper attempt to address?

The paper aims to address the issue of difficulty in verifying and evaluating generated answers in long-form question answering (QA) systems. Specifically, the paper proposes a new QA task—Semi-Extractive Multi-Source Question Answering (SEMQA), which aims to generate comprehensive answers by aggregating information from multiple different sources in a semi-extractive manner. In SEMQA, the model needs to output answers that include factual quoted segments (directly copied from given input sources) and non-factual free-text connectors that combine the quoted segments into a coherent overall paragraph. The main contributions of the paper include: 1. Introducing and defining the Semi-Extractive Multi-Source Question Answering (SEMQA) task. 2. Creating the first dataset for this task—QuoteSum, which contains high-quality, human-written semi-extractive answers. 3. Conducting experiments using various large language models (LLMs) and evaluating the performance of different models through text metrics and user studies, revealing the challenges of the SEMQA task and promoting future research development.

SEMQA: Semi-Extractive Multi-Source Question Answering

ASQA: Factoid Questions Meet Long-Form Answers

CQASUMM: Building References for Community Question Answering Summarization Corpora

Adaptive Question Answering: Enhancing Language Model Proficiency for Addressing Knowledge Conflicts with Source Citations

A Semantic-based Method for Unsupervised Commonsense Question Answering

ASQ: Automatically Generating Question-Answer Pairs using AMRs

MemSum-DQA: Adapting An Efficient Long Document Extractive Summarizer for Document Question Answering

MQAG: Multiple-choice Question Answering and Generation for Assessing Information Consistency in Summarization

Multi-Perspective Abstractive Answer Summarization

Learning to Answer Multilingual and Code-Mixed Questions

AnswerSumm: A Manually-Curated Dataset and Pipeline for Answer Summarization

SQUARE: Automatic Question Answering Evaluation using Multiple Positive and Negative References

Nonfactoid Question Answering as Query-Focused Summarization With Graph-Enhanced Multihop Inference

SceMQA: A Scientific College Entrance Level Multimodal Question Answering Benchmark

Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation

XAIQA: Explainer-Based Data Augmentation for Extractive Question Answering

Towards Automatic Generation of Questions from Long Answers

Investigating Answerability of LLMs for Long-Form Question Answering

Bridging Hierarchical and Sequential Context Modeling for Question-driven Extractive Answer Summarization

SynTQA: Synergistic Table-based Question Answering via Mixture of Text-to-SQL and E2E TQA

Concise Answers to Complex Questions: Summarization of Long-form Answers