Abstract:Open-domain generative systems have gained significant attention in the field of conversational AI (e.g., generative search engines). This paper presents a comprehensive review of the attribution mechanisms employed by these systems, particularly large language models. Though attribution or citation improve the factuality and verifiability, issues like ambiguous knowledge reservoirs, inherent biases, and the drawbacks of excessive attribution can hinder the effectiveness of these systems. The aim of this survey is to provide valuable insights for researchers, aiding in the refinement of attribution methodologies to enhance the reliability and veracity of responses generated by open-domain generative systems. We believe that this field is still in its early stages; hence, we maintain a repository to keep track of ongoing studies at <a class="link-external link-https" href="https://github.com/HITsz-TMG/awesome-llm-attributions" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The paper primarily explores the issue of "hallucinations" encountered by Large Language Models (LLMs) when generating content and proposes a comprehensive attribution mechanism to address this problem. Specifically: 1. **Research Background**: With the rise of open-domain generation systems, especially those based on large language models (such as generative search engines), ensuring the authenticity and reliability of generated content has become a persistent challenge. These issues are commonly referred to as "hallucination" problems, where the generated content contains distorted or fabricated facts and lacks credible sources. 2. **Core Issue**: The paper aims to address the "hallucination" problem in generated content, particularly by introducing an attribution mechanism to improve the authenticity and verifiability of the generated content. The attribution mechanism can help users and developers view the possible sources of answers and assess their authenticity and reliability. 3. **Objective**: By providing attribution information, the generated content can be traced back to specific sources, thereby enhancing the authenticity and reliability of the content. Additionally, the paper discusses the challenges faced by the attribution mechanism, such as the comprehensiveness (high recall) and sufficiency (high precision) of attribution, and how to achieve effective attribution through different methods and techniques. 4. **Method Classification**: The paper categorizes attribution methods into three types: - Direct Model-Driven Attribution: The model itself provides attribution information for its generated answers. - Post-Retrieval Answering: Relevant information is retrieved first, and then answers are generated based on the retrieval results. - Post-Generation Attribution: Answers are generated first, and then a search is conducted to find evidence supporting the answers. Through these methods, the paper hopes to enhance the authenticity and reliability of the content generated by large language models, making it more credible and easier to understand.

A Survey of Large Language Models Attribution

Automatic Evaluation of Attribution by Large Language Models

Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity

Unifying Corroborative and Contributive Attributions in Large Language Models

Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models

Enhancing Answer Attribution for Faithful Text Generation with Large Language Models

Towards Verifiable Generation: A Benchmark for Knowledge-aware Language Model Attribution

Evaluation of Attribution Bias in Retrieval-Augmented Large Language Models

Improving Attributed Text Generation of Large Language Models via Preference Learning

A Survey of Large Language Models

A Survey on Large Language Model based Autonomous Agents

Advancing Large Language Model Attribution through Self-Improving

Attribute or Abstain: Large Language Models as Long Document Assistants

Leveraging Large Language Models for NLG Evaluation: A Survey

A Survey on Uncertainty Quantification of Large Language Models: Taxonomy, Open Research Challenges, and Future Directions

Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems

An Evaluation Framework for Attributed Information Retrieval using Large Language Models

Neurosymbolic AI approach to Attribution in Large Language Models

Attention Heads of Large Language Models: A Survey