Bayesian inference to improve quality of Retrieval Augmented Generation

Dattaraj Rao
2024-08-12
Abstract:Retrieval Augmented Generation or RAG is the most popular pattern for modern Large Language Model or LLM applications. RAG involves taking a user query and finding relevant paragraphs of context in a large corpus typically captured in a vector database. Once the first level of search happens over a vector database, the top n chunks of relevant text are included directly in the context and sent as prompt to the LLM. Problem with this approach is that quality of text chunks depends on effectiveness of search. There is no strong post processing after search to determine if the chunk does hold enough information to include in prompt. Also many times there may be chunks that have conflicting information on the same subject and the model has no prior experience which chunk to prioritize to make a decision. Often times, this leads to the model providing a statement that there are conflicting statements, and it cannot produce an answer. In this research we propose a Bayesian approach to verify the quality of text chunks from the search results. Bayes theorem tries to relate conditional probabilities of the hypothesis with evidence and prior probabilities. We propose that, finding likelihood of text chunks to give a quality answer and using prior probability of quality of text chunks can help us improve overall quality of the responses from RAG systems. We can use the LLM itself to get a likelihood of relevance of a context paragraph. For prior probability of the text chunk, we use the page number in the documents parsed. Assumption is that that paragraphs in earlier pages have a better probability of being findings and more relevant to generalizing an answer.
Information Retrieval
What problem does this paper attempt to address?
The paper attempts to address the issue of how to improve the quality of text blocks in Retrieval Augmented Generation (RAG) systems, thereby enhancing the quality of answers generated by large language models (LLMs). Specifically, when handling user queries, RAG systems retrieve relevant text blocks from a large corpus and pass these blocks as context to the LLM. However, existing methods have the following problems: 1. **Unstable Retrieval Quality**: The quality of the retrieved text blocks depends on the effectiveness of the search, lacking strong post-processing to verify whether the text blocks contain sufficient information. 2. **Information Conflict**: The retrieved text blocks may contain contradictory information on the same topic, and the model lacks prior experience to decide which text block to prioritize, leading to the model's inability to generate a clear answer. To solve these problems, the paper proposes a Bayesian inference-based method to filter out high-quality text blocks by calculating their relevance and prior probability, thereby improving the overall response quality of the RAG system. The specific methods include: - **Calculating Likelihood**: Using the LLM to evaluate the relevance of each text block to the query. - **Prior Probability**: Assigning different prior probabilities to text blocks based on factors such as the page number of the document, assuming that paragraphs on earlier pages are more likely to contain key information. - **Bayesian Formula**: Combining likelihood and prior probability to calculate the posterior probability of each text block, selecting text blocks with posterior probabilities above a threshold as the final input. Using this method, the authors found that the answer quality of the RAG system improved by 30%, validating the effectiveness of the approach.