Improving Retrieval Augmented Language Model with Self-Reasoning

Yuan Xia,Jingbo Zhou,Zhenhui Shi,Jun Chen,Haifeng Huang
2024-08-02
Abstract:The Retrieval-Augmented Language Model (RALM) has shown remarkable performance on knowledge-intensive tasks by incorporating external knowledge during inference, which mitigates the factual hallucinations inherited in large language models (LLMs). Despite these advancements, challenges persist in the implementation of RALMs, particularly concerning their reliability and traceability. To be specific, the irrelevant document retrieval may result in unhelpful response generation or even deteriorate the performance of LLMs, while the lack of proper citations in generated outputs complicates efforts to verify the trustworthiness of the models. To this end, we propose a novel self-reasoning framework aimed at improving the reliability and traceability of RALMs, whose core idea is to leverage reasoning trajectories generated by the LLM itself. The framework involves constructing self-reason trajectories with three processes: a relevance-aware process, an evidence-aware selective process, and a trajectory analysis process. We have evaluated our framework across four public datasets (two short-form QA datasets, one long-form QA dataset, and one fact verification dataset) to demonstrate the superiority of our method, which can outperform existing state-of-art models and can achieve comparable performance with GPT-4, while only using 2,000 training samples.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address the problem of how to improve the reliability and traceability of Retrieval-Augmented Language Models (RALM). Despite the excellent performance of existing RALMs in knowledge-intensive tasks, there are still the following challenges: 1. **Irrelevant Document Retrieval**: Retrieved irrelevant documents may lead to the generation of useless answers and even degrade the performance of large language models (LLMs). 2. **Lack of Proper Citations**: The generated output lacks proper citations, making it complex to verify the model's credibility. To tackle these challenges, the authors propose a new self-reasoning framework (SELF-REASONING) aimed at improving the reliability and traceability of RALM by leveraging the reasoning trajectories generated by the LLM itself. This framework includes three processes: 1. **Relevance-Aware Process (RAP)**: Guides the LLM to judge the relevance of the retrieved documents to the question and generate explanations for why these documents are considered relevant. 2. **Evidence-Aware Selective Process (EAP)**: Guides the LLM to select and cite relevant documents and automatically choose key sentences as evidence. 3. **Trajectory Analysis Process (TAP)**: Requires the LLM to analyze all self-reasoning trajectories generated from the first two processes and ultimately generate concise analyses and answers. Through these processes, the framework aims to improve the accuracy and reliability of RALM in handling knowledge-intensive tasks while enhancing its traceability. Experimental results show that this method performs excellently on multiple public datasets, achieving performance comparable to existing state-of-the-art models with only 2,000 training samples, and even surpassing GPT-4 on certain metrics.