Abstract:The Retrieval-Augmented Language Model (RALM) has shown remarkable performance on knowledge-intensive tasks by incorporating external knowledge during inference, which mitigates the factual hallucinations inherited in large language models (LLMs). Despite these advancements, challenges persist in the implementation of RALMs, particularly concerning their reliability and traceability. To be specific, the irrelevant document retrieval may result in unhelpful response generation or even deteriorate the performance of LLMs, while the lack of proper citations in generated outputs complicates efforts to verify the trustworthiness of the models. To this end, we propose a novel self-reasoning framework aimed at improving the reliability and traceability of RALMs, whose core idea is to leverage reasoning trajectories generated by the LLM itself. The framework involves constructing self-reason trajectories with three processes: a relevance-aware process, an evidence-aware selective process, and a trajectory analysis process. We have evaluated our framework across four public datasets (two short-form QA datasets, one long-form QA dataset, and one fact verification dataset) to demonstrate the superiority of our method, which can outperform existing state-of-art models and can achieve comparable performance with GPT-4, while only using 2,000 training samples.

What problem does this paper attempt to address?

The paper attempts to address the problem of how to improve the reliability and traceability of Retrieval-Augmented Language Models (RALM). Despite the excellent performance of existing RALMs in knowledge-intensive tasks, there are still the following challenges: 1. **Irrelevant Document Retrieval**: Retrieved irrelevant documents may lead to the generation of useless answers and even degrade the performance of large language models (LLMs). 2. **Lack of Proper Citations**: The generated output lacks proper citations, making it complex to verify the model's credibility. To tackle these challenges, the authors propose a new self-reasoning framework (SELF-REASONING) aimed at improving the reliability and traceability of RALM by leveraging the reasoning trajectories generated by the LLM itself. This framework includes three processes: 1. **Relevance-Aware Process (RAP)**: Guides the LLM to judge the relevance of the retrieved documents to the question and generate explanations for why these documents are considered relevant. 2. **Evidence-Aware Selective Process (EAP)**: Guides the LLM to select and cite relevant documents and automatically choose key sentences as evidence. 3. **Trajectory Analysis Process (TAP)**: Requires the LLM to analyze all self-reasoning trajectories generated from the first two processes and ultimately generate concise analyses and answers. Through these processes, the framework aims to improve the accuracy and reliability of RALM in handling knowledge-intensive tasks while enhancing its traceability. Experimental results show that this method performs excellently on multiple public datasets, achieving performance comparable to existing state-of-the-art models with only 2,000 training samples, and even surpassing GPT-4 on certain metrics.

Improving Retrieval Augmented Language Model with Self-Reasoning

RA-ISF: Learning to Answer and Understand from Retrieval Augmentation via Iterative Self-Feedback

RRAML: Reinforced Retrieval Augmented Machine Learning

RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models

Rethinking with Retrieval: Faithful Large Language Model Inference

Benchmarking Retrieval-Augmented Large Language Models in Biomedical NLP: Application, Robustness, and Self-Awareness

Improving Language Model Reasoning with Self-motivated Learning

Unraveling and Mitigating Retriever Inconsistencies in Retrieval-Augmented Large Language Models

Furthest Reasoning with Plan Assessment: Stable Reasoning Path with Retrieval-Augmented Large Language Models

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Making Retrieval-Augmented Language Models Robust to Irrelevant Context

In-Context Retrieval-Augmented Language Models

RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation

IM-RAG: Multi-Round Retrieval-Augmented Generation Through Learning Inner Monologues

RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing

RETA-LLM: A Retrieval-Augmented Large Language Model Toolkit

Retrieval-enhanced Knowledge Editing in Language Models for Multi-Hop Question Answering

Retrieval Meets Reasoning: Even High-school Textbook Knowledge Benefits Multimodal Reasoning

R4: Reinforced Retriever-Reorder-Responder for Retrieval-Augmented Large Language Models

Evaluating the Effectiveness of Retrieval-Augmented Large Language Models in Scientific Document Reasoning