FIT-RAG: Black-Box RAG with Factual Information and Token Reduction

Yuren Mao,Xuemei Dong,Wenyi Xu,Yunjun Gao,Bin Wei,Ying Zhang
2024-03-21
Abstract:Due to the extraordinarily large number of parameters, fine-tuning Large Language Models (LLMs) to update long-tail or out-of-date knowledge is impractical in lots of applications. To avoid fine-tuning, we can alternatively treat a LLM as a black-box (i.e., freeze the parameters of the LLM) and augment it with a Retrieval-Augmented Generation (RAG) system, namely black-box RAG. Recently, black-box RAG has achieved success in knowledge-intensive tasks and has gained much attention. Existing black-box RAG methods typically fine-tune the retriever to cater to LLMs' preferences and concatenate all the retrieved documents as the input, which suffers from two issues: (1) Ignorance of Factual Information. The LLM preferred documents may not contain the factual information for the given question, which can mislead the retriever and hurt the effectiveness of black-box RAG; (2) Waste of Tokens. Simply concatenating all the retrieved documents brings large amounts of unnecessary tokens for LLMs, which degenerates the efficiency of black-box RAG. To address these issues, this paper proposes a novel black-box RAG framework which utilizes the factual information in the retrieval and reduces the number of tokens for augmentation, dubbed FIT-RAG. FIT-RAG utilizes the factual information by constructing a bi-label document scorer. Besides, it reduces the tokens by introducing a self-knowledge recognizer and a sub-document-level token reducer. FIT-RAG achieves both superior effectiveness and efficiency, which is validated by extensive experiments across three open-domain question-answering datasets: TriviaQA, NQ and PopQA. FIT-RAG can improve the answering accuracy of Llama2-13B-Chat by 14.3\% on TriviaQA, 19.9\% on NQ and 27.5\% on PopQA, respectively. Furthermore, it can save approximately half of the tokens on average across the three datasets.
Computation and Language,Information Retrieval
What problem does this paper attempt to address?
The paper attempts to address the issue of how to avoid the computational resource consumption and untimely knowledge updates caused by frequent fine-tuning when using large language models (LLMs) for knowledge-intensive tasks. Specifically, existing methods face two main problems when dealing with black-box RAG (i.e., enhancing LLM performance without fine-tuning LLM parameters): 1. **Ignoring factual information**: Existing black-box RAG methods typically adjust the retriever based on LLM preferences, ignoring whether the documents contain factual information relevant to the question. This may result in retrieved documents that align with LLM preferences but do not contain actually useful factual information, thereby affecting the effectiveness of the RAG system. 2. **Token wastage**: Simply concatenating all retrieved documents into the input leads to a large number of useless tokens, which not only increases computational burden but also reduces the efficiency of the RAG system. To address these issues, the paper proposes a new black-box RAG framework—FIT-RAG (Factual Information and Token Reduction). FIT-RAG improves existing methods in the following ways: - **Utilizing factual information**: By constructing a Bi-label Document Scorer, FIT-RAG simultaneously considers whether a document contains the answer (Has_Answer) and whether it helps the LLM generate the correct answer (LLM_Prefer). This ensures that the retrieved documents both align with LLM preferences and contain useful factual information. - **Reducing token count**: By introducing a Bi-faceted Self-Knowledge Recognizer and a Sub-document-level Token Reducer, FIT-RAG can avoid unnecessary augmentation and minimize the number of input tokens as much as possible. Experimental results show that FIT-RAG significantly improves the answer accuracy of the Llama2-13B-Chat model on three open-domain QA datasets (TriviaQA, NQ, and PopQA), with increases of 14.3%, 19.9%, and 27.5%, respectively. Additionally, FIT-RAG also performs well in terms of input token count, saving about half of the tokens on average, greatly enhancing token efficiency and computational resource utilization.