Understanding Retrieval Augmentation for Long-Form Question Answering

Hung-Ting Chen,Fangyuan Xu,Shane Arora,Eunsol Choi
2023-10-19
Abstract:We present a study of retrieval-augmented language models (LMs) on long-form question answering. We analyze how retrieval augmentation impacts different LMs, by comparing answers generated from models while using the same evidence documents, and how differing quality of retrieval document set impacts the answers generated from the same LM. We study various attributes of generated answers (e.g., fluency, length, variance) with an emphasis on the attribution of generated long-form answers to in-context evidence documents. We collect human annotations of answer attribution and evaluate methods for automatically judging attribution. Our study provides new insights on how retrieval augmentation impacts long, knowledge-rich text generation of LMs. We further identify attribution patterns for long text generation and analyze the main culprits of attribution errors. Together, our analysis reveals how retrieval augmentation impacts long knowledge-rich text generation and provide directions for future work.
Computation and Language
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to study the performance of retrieval-augmented language models in the task of long-form question answering (LFQA). Specifically, the paper focuses on the following aspects: 1. **Impact of Retrieval Augmentation on Different Language Models**: - Compare the answers generated by different language models when using the same evidence documents. - Analyze the impact of retrieval document sets of different quality on the answers generated by the same language model. 2. **Analysis of Generated Answer Attributes**: - Study various attributes of the generated answers, such as fluency, length, variability, etc. - Pay special attention to the relevance between the generated long-form answers and the contextual evidence documents. 3. **Human Annotation and Automatic Evaluation**: - Collect human annotations on the attributes of the answers. - Evaluate methods for automatically judging these attributes. 4. **Impact of Retrieval Augmentation on the Generation of Long, Knowledge-Rich Texts**: - Reveal how retrieval augmentation affects the generation of long, knowledge-rich texts. - Identify attribution patterns in long-text generation and their main error causes. Through these studies, the paper provides new insights, revealing how retrieval augmentation affects long-text generation and pointing out directions for future research.