Abstract:Due to the extraordinarily large number of parameters, fine-tuning Large Language Models (LLMs) to update long-tail or out-of-date knowledge is impractical in lots of applications. To avoid fine-tuning, we can alternatively treat a LLM as a black-box (i.e., freeze the parameters of the LLM) and augment it with a Retrieval-Augmented Generation (RAG) system, namely black-box RAG. Recently, black-box RAG has achieved success in knowledge-intensive tasks and has gained much attention. Existing black-box RAG methods typically fine-tune the retriever to cater to LLMs' preferences and concatenate all the retrieved documents as the input, which suffers from two issues: (1) Ignorance of Factual Information. The LLM preferred documents may not contain the factual information for the given question, which can mislead the retriever and hurt the effectiveness of black-box RAG; (2) Waste of Tokens. Simply concatenating all the retrieved documents brings large amounts of unnecessary tokens for LLMs, which degenerates the efficiency of black-box RAG. To address these issues, this paper proposes a novel black-box RAG framework which utilizes the factual information in the retrieval and reduces the number of tokens for augmentation, dubbed FIT-RAG. FIT-RAG utilizes the factual information by constructing a bi-label document scorer. Besides, it reduces the tokens by introducing a self-knowledge recognizer and a sub-document-level token reducer. FIT-RAG achieves both superior effectiveness and efficiency, which is validated by extensive experiments across three open-domain question-answering datasets: TriviaQA, NQ and PopQA. FIT-RAG can improve the answering accuracy of Llama2-13B-Chat by 14.3\% on TriviaQA, 19.9\% on NQ and 27.5\% on PopQA, respectively. Furthermore, it can save approximately half of the tokens on average across the three datasets.

What problem does this paper attempt to address?

The paper attempts to address the issue of how to avoid the computational resource consumption and untimely knowledge updates caused by frequent fine-tuning when using large language models (LLMs) for knowledge-intensive tasks. Specifically, existing methods face two main problems when dealing with black-box RAG (i.e., enhancing LLM performance without fine-tuning LLM parameters): 1. **Ignoring factual information**: Existing black-box RAG methods typically adjust the retriever based on LLM preferences, ignoring whether the documents contain factual information relevant to the question. This may result in retrieved documents that align with LLM preferences but do not contain actually useful factual information, thereby affecting the effectiveness of the RAG system. 2. **Token wastage**: Simply concatenating all retrieved documents into the input leads to a large number of useless tokens, which not only increases computational burden but also reduces the efficiency of the RAG system. To address these issues, the paper proposes a new black-box RAG framework—FIT-RAG (Factual Information and Token Reduction). FIT-RAG improves existing methods in the following ways: - **Utilizing factual information**: By constructing a Bi-label Document Scorer, FIT-RAG simultaneously considers whether a document contains the answer (Has_Answer) and whether it helps the LLM generate the correct answer (LLM_Prefer). This ensures that the retrieved documents both align with LLM preferences and contain useful factual information. - **Reducing token count**: By introducing a Bi-faceted Self-Knowledge Recognizer and a Sub-document-level Token Reducer, FIT-RAG can avoid unnecessary augmentation and minimize the number of input tokens as much as possible. Experimental results show that FIT-RAG significantly improves the answer accuracy of the Llama2-13B-Chat model on three open-domain QA datasets (TriviaQA, NQ, and PopQA), with increases of 14.3%, 19.9%, and 27.5%, respectively. Additionally, FIT-RAG also performs well in terms of input token count, saving about half of the tokens on average, greatly enhancing token efficiency and computational resource utilization.

FIT-RAG: Black-Box RAG with Factual Information and Token Reduction

FIT-RAG: Black-Box RAG with Factual Information and Token Reduction

FoRAG: Factuality-optimized Retrieval Augmented Generation for Web-enhanced Long-form Question Answering

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models

Refiner: Restructure Retrieval Content Efficiently to Advance Question-Answering Capabilities

A Theory for Token-Level Harmonization in Retrieval-Augmented Generation

Refiner: Restructure Retrieved Content Efficiently to Advance Question-Answering Capabilities

Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models

LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering

Enhancing Retrieval and Managing Retrieval: A Four-Module Synergy for Improved Quality and Efficiency in RAG Systems

LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs

W-RAG: Weakly Supervised Dense Retrieval in RAG for Open-domain Question Answering

Efficient In-Domain Question Answering for Resource-Constrained Environments

Improving Retrieval for RAG based Question Answering Models on Financial Documents

Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting

Compressing Long Context for Enhancing RAG with AMR-based Concept Distillation

Meta Knowledge for Retrieval Augmented Large Language Models

RAFT: Adapting Language Model to Domain Specific RAG

RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation