Context-augmented Retrieval: A Novel Framework for Fast Information Retrieval based Response Generation using Large Language Model

Sai Ganesh,Anupam Purwar,Gautam B
2024-06-24
Abstract:Generating high-quality answers consistently by providing contextual information embedded in the prompt passed to the Large Language Model (LLM) is dependent on the quality of information retrieval. As the corpus of contextual information grows, the answer/inference quality of Retrieval Augmented Generation (RAG) based Question Answering (QA) systems declines. This work solves this problem by combining classical text classification with the Large Language Model (LLM) to enable quick information retrieval from the vector store and ensure the relevancy of retrieved information. For the same, this work proposes a new approach Context Augmented retrieval (CAR), where partitioning of vector database by real-time classification of information flowing into the corpus is done. CAR demonstrates good quality answer generation along with significant reduction in information retrieval and answer generation time.
Information Retrieval
What problem does this paper attempt to address?
The paper attempts to address the issue of declining quality in information retrieval and answer generation in Retrieval-Augmented Generation (RAG) question-answering systems as the size of large knowledge bases increases. Specifically, as the knowledge base grows, RAG systems become inefficient in retrieving relevant documents and generating high-quality answers, leading to increased response and retrieval times. To solve this problem, the paper proposes a new framework called Context Augmented Retrieval (CAR). CAR combines traditional text classification methods with large language models (LLMs) to achieve fast information retrieval while ensuring the relevance of the retrieved information. The specific approach includes: 1. **Query Classification**: Using a classification model to classify user queries in real-time, categorizing them into relevant domains or categories. 2. **Index Loading**: Loading indexes of specific domains based on the classification results to retrieve context information relevant to the user query. 3. **Hybrid Retriever**: Combining BM25 retriever and vector retriever to efficiently retrieve relevant information from the indexes. 4. **Query Engine**: Passing the retrieved context information along with the user query to the large language model to generate coherent and information-rich answers. Through these steps, CAR not only improves the efficiency of information retrieval but also significantly reduces the time for information retrieval and answer generation while maintaining the quality of the answers.