Attribute First, then Generate: Locally-attributable Grounded Text Generation

Aviv Slobodkin,Eran Hirsch,Arie Cattan,Tal Schuster,Ido Dagan

2024-07-04

Abstract:Recent efforts to address hallucinations in Large Language Models (LLMs) have focused on attributed text generation, which supplements generated texts with citations of supporting sources for post-generation fact-checking and corrections. Yet, these citations often point to entire documents or paragraphs, burdening users with extensive verification work. In this paper, we introduce a locally-attributable text generation approach, prioritizing concise attributions. Our method, named "Attribute First, then Generate", breaks down the conventional end-to-end generation process into three intuitive steps: content selection, sentence planning, and sequential sentence generation. By initially identifying relevant source segments ("select first") and then conditioning the generation process on them ("then generate"), we ensure these segments also act as the output's fine-grained attributions ("select" becomes "attribute"). Tested on Multi-document Summarization and Long-form Question-answering, our method not only yields more concise citations than the baselines but also maintains - and in some cases enhances - both generation quality and attribution accuracy. Furthermore, it significantly reduces the time required for fact verification by human assessors.

Computation and Language

What problem does this paper attempt to address?

The paper aims to address the hallucination problem in large language models (LLMs) when generating text, where the generated text does not align with the actual sources. Specifically, the paper proposes a method of local attribution for text generation to improve the accuracy and conciseness of attribution. Current methods, although capable of providing supporting sources for the generated text, often point to entire documents or paragraphs, requiring users to perform a significant amount of work to verify facts. To this end, the paper proposes a framework called "Attribution-First, Generation-Second," which decomposes the traditional end-to-end generation process into three steps: content selection, sentence planning, and sentence-by-sentence generation. By first selecting relevant fragments from the source text and then generating based on these fragments, the method ensures that the generated content has fine-grained attribution. This approach not only makes attribution more accurate and concise but also improves the quality of the generated text and significantly reduces the time required for manual verification. Experimental results show that this method performs excellently in multi-document summarization and long-form question answering tasks.

Attribute First, then Generate: Locally-attributable Grounded Text Generation

On the Capacity of Citation Generation by Large Language Models

Improving Attributed Text Generation of Large Language Models via Preference Learning

Ground Every Sentence: Improving Retrieval-Augmented LLMs with Interleaved Reference-Claim Generation

Learning Fine-Grained Grounded Citations for Attributed Large Language Models

Chain-of-Thought Improves Text Generation with Citations in Large Language Models

Enhancing Answer Attribution for Faithful Text Generation with Large Language Models

Learning to Plan and Generate Text with Citations

Localizing Factual Inconsistencies in Attributable Text Generation

Advancing Large Language Model Attribution through Self-Improving

Automatic Evaluation of Attribution by Large Language Models

Verifiable Generation with Subsentence-Level Fine-Grained Citations

LLM Attributor: Interactive Visual Attribution for LLM Generation

Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation

Measuring text summarization factuality using atomic facts entailment metrics in the context of retrieval augmented generation

Towards Verifiable Generation: A Benchmark for Knowledge-aware Language Model Attribution

Effective Large Language Model Adaptation for Improved Grounding and Citation Generation

Learning to Generate Answers with Citations via Factual Consistency Models

Towards Verifiable Text Generation with Evolving Memory and Self-Reflection

Evaluation of Attribution Bias in Retrieval-Augmented Large Language Models

Control, Generate, Augment: A Scalable Framework for Multi-Attribute Text Generation