Abstract:Enterprise retrieval augmented generation (RAG) offers a highly flexible framework for combining powerful large language models (LLMs) with internal, possibly temporally changing, documents. In RAG, documents are first chunked. Relevant chunks are then retrieved for a user query, which are passed as context to a synthesizer LLM to generate the query response. However, the retrieval step can limit performance, as incorrect chunks can lead the synthesizer LLM to generate a false response. This work applies a zero-shot adaptation of standard dense retrieval steps for more accurate chunk recall. Specifically, a chunk is first decomposed into atomic statements. A set of synthetic questions are then generated on these atoms (with the chunk as the context). Dense retrieval involves finding the closest set of synthetic questions, and associated chunks, to the user query. It is found that retrieval with the atoms leads to higher recall than retrieval with chunks. Further performance gain is observed with retrieval using the synthetic questions generated over the atoms. Higher recall at the retrieval step enables higher performance of the enterprise LLM using the RAG pipeline.

What problem does this paper attempt to address?

This paper attempts to solve the problem of limited performance in the retrieval step of enterprise Retrieval - Augmented Generation (RAG) systems. Specifically, in current RAG systems, when processing internal documents, inaccurate retrieved document fragments may lead to incorrect answers generated by the language model. To solve this problem, the author proposes a new method: decomposing document fragments into atomic statements and generating synthetic questions based on these atomic statements, thereby improving the accuracy of retrieval. ### Main problems 1. **Bottleneck in the retrieval step**: - In existing RAG systems, the retrieval step is achieved by dividing documents into multiple chunks and then retrieving relevant chunks according to user queries. - If the retrieved chunks are inaccurate, it will cause the subsequent language model to generate incorrect answers. 2. **Limitations of existing methods**: - Existing dense retrieval methods directly compare the embedding representations of queries and document chunks. However, since queries are usually in the form of questions and document chunks contain a large amount of information, this direct comparison may lead to inaccurate matching. - There are differences in the semantic embedding representations between queries and document chunks, resulting in poor retrieval performance. ### Proposed solutions 1. **Atomic decomposition**: - Each document chunk is further decomposed into smaller atomic statements. These atomic statements can be either structured (e.g., sentences) or unstructured (a series of independent statements generated by the model). - In this way, the information in the document chunks can be represented more precisely, reducing the impact of redundant information on retrieval. 2. **Synthetic question generation**: - For each atomic statement, a series of synthetic questions are generated. These questions are generated based on the content of the atomic statement and are aimed at capturing the key information of the atomic statement. - By comparing the embedding representations of the query with these synthetic questions, relevant document chunks can be found more accurately. 3. **Improved retrieval strategy**: - Use the above - mentioned atomic statements and synthetic questions for dense retrieval instead of directly using the original document chunks. - Experimental results show that this method can significantly improve the recall rate of retrieval, thereby enhancing the performance of the entire RAG system. ### Formula representation - Let \( R(q; c)\in\{0, 1\} \) represent an ideal association function. If the document chunk \( c \) contains the answer to the query \( q \), it returns 1; otherwise, it returns 0. - Given \( N \) document chunks \( \{c_1, c_2,\dots, c_N\} \) and a user query \( q \), the goal is to retrieve the chunk \( c_k \) that contains the answer from them, such that \( R(q; c_k) = 1 \) and for other chunks \( R(q; c_i)=0 \) (\( i\neq k \)). ### Conclusion By decomposing document chunks into atomic statements and generating synthetic questions, the method proposed in this paper significantly improves the accuracy of the retrieval step in the RAG system. This not only solves the problem of inaccurate matching between queries and document chunks in existing methods but also provides new ideas for improving the overall performance of enterprise - level RAG systems. ### Reference formulas - Embedding representation: \( c_i = E(c_i), \forall i\in[1, N] \) - Query embedding: \( q = E(q) \) - Dense retrieval selection: \[ \hat{k}=\arg\min_k\cos[q, c_k] \] - Atomic retrieval selection: \[ [\text{atom}]\hat{k}, \hat{j}=\arg\min_{k,j}\cos[q, a_j^{(k)}] \] - Synthetic question retrieval selection: \[ [\text{question}]\hat{k}, \hat{j}, \hat{i}=\arg

Question-Based Retrieval using Atomic Units for Enterprise RAG

Improving Retrieval for RAG based Question Answering Models on Financial Documents

Blended RAG: Improving RAG (Retriever-Augmented Generation) Accuracy with Semantic Search and Hybrid Query-Based Retrievers

RAG Does Not Work for Enterprises

ERATTA: Extreme RAG for Table To Answers with Large Language Models

Toward Optimal Search and Retrieval for RAG

RichRAG: Crafting Rich Responses for Multi-faceted Queries in Retrieval-Augmented Generation

LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering

R^2AG: Incorporating Retrieval Information into Retrieval Augmented Generation

LightRAG: Simple and Fast Retrieval-Augmented Generation

RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation

MBA-RAG: a Bandit Approach for Adaptive Retrieval-Augmented Generation through Question Complexity

CORAG: A Cost-Constrained Retrieval Optimization System for Retrieval-Augmented Generation

Corrective Retrieval Augmented Generation

RAGGED: Towards Informed Design of Retrieval Augmented Generation Systems

Optimizing and Evaluating Enterprise Retrieval-Augmented Generation (RAG): A Content Design Perspective

T-RAG: Lessons from the LLM Trenches

Learning When to Retrieve, What to Rewrite, and How to Respond in Conversational QA

MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery

A Comprehensive Survey of Retrieval-Augmented Generation (RAG): Evolution, Current Landscape and Future Directions

Retrieval-Augmented Generation for Large Language Models: A Survey