Understanding Retrieval Augmentation for Long-Form Question Answering

Hung-Ting Chen,Fangyuan Xu,Shane Arora,Eunsol Choi

2023-10-19

Abstract:We present a study of retrieval-augmented language models (LMs) on long-form question answering. We analyze how retrieval augmentation impacts different LMs, by comparing answers generated from models while using the same evidence documents, and how differing quality of retrieval document set impacts the answers generated from the same LM. We study various attributes of generated answers (e.g., fluency, length, variance) with an emphasis on the attribution of generated long-form answers to in-context evidence documents. We collect human annotations of answer attribution and evaluate methods for automatically judging attribution. Our study provides new insights on how retrieval augmentation impacts long, knowledge-rich text generation of LMs. We further identify attribution patterns for long text generation and analyze the main culprits of attribution errors. Together, our analysis reveals how retrieval augmentation impacts long knowledge-rich text generation and provide directions for future work.

Computation and Language

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper aims to study the performance of retrieval-augmented language models in the task of long-form question answering (LFQA). Specifically, the paper focuses on the following aspects: 1. **Impact of Retrieval Augmentation on Different Language Models**: - Compare the answers generated by different language models when using the same evidence documents. - Analyze the impact of retrieval document sets of different quality on the answers generated by the same language model. 2. **Analysis of Generated Answer Attributes**: - Study various attributes of the generated answers, such as fluency, length, variability, etc. - Pay special attention to the relevance between the generated long-form answers and the contextual evidence documents. 3. **Human Annotation and Automatic Evaluation**: - Collect human annotations on the attributes of the answers. - Evaluate methods for automatically judging these attributes. 4. **Impact of Retrieval Augmentation on the Generation of Long, Knowledge-Rich Texts**: - Reveal how retrieval augmentation affects the generation of long, knowledge-rich texts. - Identify attribution patterns in long-text generation and their main error causes. Through these studies, the paper provides new insights, revealing how retrieval augmentation affects long-text generation and pointing out directions for future research.

Understanding Retrieval Augmentation for Long-Form Question Answering

LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering

RetrievalQA: Assessing Adaptive Retrieval-Augmented Generation for Short-form Open-Domain Question Answering

Enhancing Large Language Models with Domain-specific Retrieval Augment Generation: A Case Study on Long-form Consumer Health Question Answering in Ophthalmology

Retrieval-enhanced Knowledge Editing in Language Models for Multi-Hop Question Answering

RAG-QA Arena: Evaluating Domain Robustness for Long-form Retrieval Augmented Question Answering

Attribute or Abstain: Large Language Models as Long Document Assistants

Reimagining Retrieval Augmented Language Models for Answering Queries

Retrieval-Augmented Generation for Large Language Models: A Survey

Modeling Exemplification in Long-form Question Answering via Retrieval

Investigating Answerability of LLMs for Long-Form Question Answering

Retrieval meets Long Context Large Language Models

Enhancing Answer Attribution for Faithful Text Generation with Large Language Models

Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation

Retrieve Anything To Augment Large Language Models

ALR$^2$: A Retrieve-then-Reason Framework for Long-context Question Answering

Retrieval Augmented Generation for Domain-specific Question Answering

Improving Retrieval for RAG based Question Answering Models on Financial Documents

Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy

When Do LLMs Need Retrieval Augmentation? Mitigating LLMs' Overconfidence Helps Retrieval Augmentation