Clustered Retrieved Augmented Generation (CRAG)

Simon Akesson,Frances A. Santos

2024-05-25

Abstract:Providing external knowledge to Large Language Models (LLMs) is a key point for using these models in real-world applications for several reasons, such as incorporating up-to-date content in a real-time manner, providing access to domain-specific knowledge, and contributing to hallucination prevention. The vector database-based Retrieval Augmented Generation (RAG) approach has been widely adopted to this end. Thus, any part of external knowledge can be retrieved and provided to some LLM as the input context. Despite RAG approach's success, it still might be unfeasible for some applications, because the context retrieved can demand a longer context window than the size supported by LLM. Even when the context retrieved fits into the context window size, the number of tokens might be expressive and, consequently, impact costs and processing time, becoming impractical for most applications. To address these, we propose CRAG, a novel approach able to effectively reduce the number of prompting tokens without degrading the quality of the response generated compared to a solution using RAG. Through our experiments, we show that CRAG can reduce the number of tokens by at least 46\%, achieving more than 90\% in some cases, compared to RAG. Moreover, the number of tokens with CRAG does not increase considerably when the number of reviews analyzed is higher, unlike RAG, where the number of tokens is almost 9x higher when there are 75 reviews compared to 4 reviews.

Computation and Language,Artificial Intelligence

What problem does this paper attempt to address?

The problem this paper attempts to address is: How to effectively reduce the number of tokens required for external knowledge input when using large language models (LLMs) for practical applications, without compromising the quality of the generated responses. Specifically, traditional retrieval-augmented generation (RAG) methods, while capable of providing external knowledge, may exceed the context window size supported by LLMs when handling large amounts of data, leading to increased costs and processing time. Therefore, the paper proposes a new method—Clustering Retrieval-Augmented Generation (CRAG), which effectively reduces the number of input tokens through three steps: clustering, summarization, and aggregation, thereby lowering costs, improving efficiency, and maintaining the quality of generated responses. The paper demonstrates through experiments that CRAG can significantly reduce the number of tokens compared to RAG, with a reduction of up to 90%, and performs stably across datasets of different sizes, without a significant increase in the number of tokens as the data volume increases. Additionally, the responses generated by CRAG are semantically very similar to those generated by RAG, indicating that CRAG can effectively improve the practical application performance of LLMs without sacrificing quality.

Clustered Retrieved Augmented Generation (CRAG)

Corrective Retrieval Augmented Generation

CORAG: A Cost-Constrained Retrieval Optimization System for Retrieval-Augmented Generation

Eliciting Critical Reasoning in Retrieval-Augmented Language Models via Contrastive Explanations

Retrieval-Augmented Generation for Large Language Models: A Survey

RAGGED: Towards Informed Design of Retrieval Augmented Generation Systems

Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models

Quantifying reliance on external information over parametric knowledge during Retrieval Augmented Generation (RAG) using mechanistic analysis

ChunkRAG: Novel LLM-Chunk Filtering Method for RAG Systems

Context Awareness Gate For Retrieval Augmented Generation

Adaptive Retrieval-Augmented Generation for Conversational Systems

R^2AG: Incorporating Retrieval Information into Retrieval Augmented Generation

RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation

LightRAG: Simple and Fast Retrieval-Augmented Generation

RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation

EACO-RAG: Edge-Assisted and Collaborative RAG with Adaptive Knowledge Update

Learning When to Retrieve, What to Rewrite, and How to Respond in Conversational QA

Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG

Meta Knowledge for Retrieval Augmented Large Language Models

Mindful-RAG: A Study of Points of Failure in Retrieval Augmented Generation