Abstract:Enterprise retrieval augmented generation (RAG) offers a highly flexible framework for combining powerful large language models (LLMs) with internal, possibly temporally changing, documents. In RAG, documents are first chunked. Relevant chunks are then retrieved for a user query, which are passed as context to a synthesizer LLM to generate the query response. However, the retrieval step can limit performance, as incorrect chunks can lead the synthesizer LLM to generate a false response. This work applies a zero-shot adaptation of standard dense retrieval steps for more accurate chunk recall. Specifically, a chunk is first decomposed into atomic statements. A set of synthetic questions are then generated on these atoms (with the chunk as the context). Dense retrieval involves finding the closest set of synthetic questions, and associated chunks, to the user query. It is found that retrieval with the atoms leads to higher recall than retrieval with chunks. Further performance gain is observed with retrieval using the synthetic questions generated over the atoms. Higher recall at the retrieval step enables higher performance of the enterprise LLM using the RAG pipeline.
What problem does this paper attempt to address?
This paper attempts to solve the problem of limited performance in the retrieval step of enterprise Retrieval - Augmented Generation (RAG) systems. Specifically, in current RAG systems, when processing internal documents, inaccurate retrieved document fragments may lead to incorrect answers generated by the language model. To solve this problem, the author proposes a new method: decomposing document fragments into atomic statements and generating synthetic questions based on these atomic statements, thereby improving the accuracy of retrieval.
### Main problems
1. **Bottleneck in the retrieval step**:
- In existing RAG systems, the retrieval step is achieved by dividing documents into multiple chunks and then retrieving relevant chunks according to user queries.
- If the retrieved chunks are inaccurate, it will cause the subsequent language model to generate incorrect answers.
2. **Limitations of existing methods**:
- Existing dense retrieval methods directly compare the embedding representations of queries and document chunks. However, since queries are usually in the form of questions and document chunks contain a large amount of information, this direct comparison may lead to inaccurate matching.
- There are differences in the semantic embedding representations between queries and document chunks, resulting in poor retrieval performance.
### Proposed solutions
1. **Atomic decomposition**:
- Each document chunk is further decomposed into smaller atomic statements. These atomic statements can be either structured (e.g., sentences) or unstructured (a series of independent statements generated by the model).
- In this way, the information in the document chunks can be represented more precisely, reducing the impact of redundant information on retrieval.
2. **Synthetic question generation**:
- For each atomic statement, a series of synthetic questions are generated. These questions are generated based on the content of the atomic statement and are aimed at capturing the key information of the atomic statement.
- By comparing the embedding representations of the query with these synthetic questions, relevant document chunks can be found more accurately.
3. **Improved retrieval strategy**:
- Use the above - mentioned atomic statements and synthetic questions for dense retrieval instead of directly using the original document chunks.
- Experimental results show that this method can significantly improve the recall rate of retrieval, thereby enhancing the performance of the entire RAG system.
### Formula representation
- Let \( R(q; c)\in\{0, 1\} \) represent an ideal association function. If the document chunk \( c \) contains the answer to the query \( q \), it returns 1; otherwise, it returns 0.
- Given \( N \) document chunks \( \{c_1, c_2,\dots, c_N\} \) and a user query \( q \), the goal is to retrieve the chunk \( c_k \) that contains the answer from them, such that \( R(q; c_k) = 1 \) and for other chunks \( R(q; c_i)=0 \) (\( i\neq k \)).
### Conclusion
By decomposing document chunks into atomic statements and generating synthetic questions, the method proposed in this paper significantly improves the accuracy of the retrieval step in the RAG system. This not only solves the problem of inaccurate matching between queries and document chunks in existing methods but also provides new ideas for improving the overall performance of enterprise - level RAG systems.
### Reference formulas
- Embedding representation: \( c_i = E(c_i), \forall i\in[1, N] \)
- Query embedding: \( q = E(q) \)
- Dense retrieval selection: \[ \hat{k}=\arg\min_k\cos[q, c_k] \]
- Atomic retrieval selection: \[ [\text{atom}]\hat{k}, \hat{j}=\arg\min_{k,j}\cos[q, a_j^{(k)}] \]
- Synthetic question retrieval selection: \[ [\text{question}]\hat{k}, \hat{j}, \hat{i}=\arg