SFR-RAG: Towards Contextually Faithful LLMs

Xuan-Phi Nguyen,Shrey Pandit,Senthil Purushwalkam,Austin Xu,Hailin Chen,Yifei Ming,Zixuan Ke,Silvio Savarese,Caiming Xong,Shafiq Joty
2024-09-16
Abstract:Retrieval Augmented Generation (RAG), a paradigm that integrates external contextual information with large language models (LLMs) to enhance factual accuracy and relevance, has emerged as a pivotal area in generative AI. The LLMs used in RAG applications are required to faithfully and completely comprehend the provided context and users' questions, avoid hallucination, handle unanswerable, counterfactual or otherwise low-quality and irrelevant contexts, perform complex multi-hop reasoning and produce reliable citations. In this paper, we introduce SFR-RAG, a small LLM that is instruction-tuned with an emphasis on context-grounded generation and hallucination minimization. We also present ContextualBench, a new evaluation framework compiling multiple popular and diverse RAG benchmarks, such as HotpotQA and TriviaQA, with consistent RAG settings to ensure reproducibility and consistency in model assessments. Experimental results demonstrate that our SFR-RAG-9B model outperforms leading baselines such as Command-R+ (104B) and GPT-4o, achieving state-of-the-art results in 3 out of 7 benchmarks in ContextualBench with significantly fewer parameters. The model is also shown to be resilient to alteration in the contextual information and behave appropriately when relevant context is removed. Additionally, the SFR-RAG model maintains competitive performance in general instruction-following tasks and function-calling capabilities.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is, in the Retrieval - Augmented Generation (RAG) framework, how to make large language models (LLMs) understand and utilize external context information more reliably, so as to improve the factual accuracy, relevance and reliability of the generated answers. Specifically, the paper focuses on the following aspects: 1. **Reducing hallucinations**: that is, avoiding LLMs from generating inaccurate or fact - inconsistent content without sufficient context support. 2. **Handling low - quality or irrelevant context**: when the provided context information is of low quality or irrelevant to the question, the model should be able to recognize and respond appropriately. 3. **Multi - hop reasoning ability**: the model needs to be able to perform complex logical reasoning among multiple context fragments to generate accurate answers. 4. **Citation ability**: the model should be able to reliably cite sources in the context, increasing the credibility of the answers. To solve the above problems, the paper introduces the SFR - RAG model, which is a small, specifically tuned LLM aimed at enhancing its context - understanding ability in RAG applications and reducing hallucination phenomena. In addition, the paper also proposes a new evaluation framework - ContextualBench, which is used to evaluate the performance of RAG models in a standardized setting, ensuring the repeatability and consistency of evaluation results.