Abstract:Large Language Models are revolutionizing Web, mobile, and Web of Things systems, driving intelligent and scalable solutions. However, as Retrieval-Augmented Generation (RAG) systems expand, they encounter significant challenges related to scalability, including increased delay and communication overhead. To address these issues, we propose EACO-RAG, an edge-assisted distributed RAG system that leverages adaptive knowledge updates and inter-node collaboration. By distributing vector datasets across edge nodes and optimizing retrieval processes, EACO-RAG significantly reduces delay and resource consumption while enhancing response accuracy. The system employs a multi-armed bandit framework with safe online Bayesian methods to balance performance and cost. Extensive experimental evaluation demonstrates that EACO-RAG outperforms traditional centralized RAG systems in both response time and resource efficiency. EACO-RAG effectively reduces delay and resource expenditure to levels comparable to, or even lower than, those of local RAG systems, while significantly improving accuracy. This study presents the first systematic exploration of edge-assisted distributed RAG architectures, providing a scalable and cost-effective solution for large-scale distributed environments.

What problem does this paper attempt to address?

The problem this paper attempts to address is: how to optimize Retrieval-Augmented Generation (RAG) systems through adaptive knowledge updates and edge node collaboration to reduce resource consumption, lower latency, and improve response accuracy. Specifically, the paper proposes an edge-assisted distributed RAG system named EACO-RAG, aiming to tackle the scalability challenges encountered by existing RAG systems during expansion, such as increased latency and communication overhead. ### Main Issues 1. **Scalability Challenges**: As RAG systems expand, especially in large-scale distributed environments, significant latency and communication overhead issues arise. 2. **Resource Consumption**: Traditional centralized RAG systems consume a large amount of computational resources when handling numerous queries, leading to increased costs. 3. **Response Accuracy**: In large-scale distributed environments, maintaining or improving the accuracy of generated responses is a key issue. ### Solution The paper proposes the EACO-RAG system to address the above issues through the following methods: 1. **Edge Assistance**: Distributing vector datasets across multiple edge nodes, leveraging the advantages of edge computing to reduce latency and resource consumption. 2. **Adaptive Knowledge Updates**: Edge nodes can dynamically update local knowledge bases, adjusting in real-time based on user behavior and needs. 3. **Inter-Node Collaboration**: Using a multi-armed bandit framework and secure online Bayesian methods to balance performance and cost, optimizing retrieval and generation strategies. ### Experimental Results Experimental results show that the EACO-RAG system outperforms traditional centralized RAG systems in terms of response time and resource utilization, significantly reducing latency and costs while improving accuracy. ### Contributions 1. **Systematically proposing and studying the edge-assisted distributed RAG architecture for the first time**, providing a cost-efficient solution through adaptive knowledge updates and inter-node collaboration. 2. **Designing an adaptive knowledge update mechanism**, enabling edge nodes to dynamically adjust local knowledge bases to adapt to changes in user behavior and needs. 3. **Optimizing the retrieval process**, integrating edge collaboration to better balance real-time performance and resource efficiency, ensuring scalability in distributed systems. 4. **Conducting extensive experimental evaluations**, validating the superiority of EACO-RAG in terms of response time and resource utilization. In summary, this paper provides an effective method for optimizing RAG systems in large-scale distributed environments through the EACO-RAG system, addressing the challenges of scalability, latency, and resource consumption in existing systems.

EACO-RAG: Edge-Assisted and Collaborative RAG with Adaptive Knowledge Update

Adapting to Non-Stationary Environments: Multi-Armed Bandit Enhanced Retrieval-Augmented Generation on Knowledge Graphs

ActiveRAG: Autonomously Knowledge Assimilation and Accommodation through Retrieval-Augmented Agents

EasyRAG: Efficient Retrieval-Augmented Generation Framework for Automated Network Operations

RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards

Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models

Retrieval-Augmented Generation for Large Language Models: A Survey

LightRAG: Simple and Fast Retrieval-Augmented Generation

RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation

WeKnow-RAG: An Adaptive Approach for Retrieval-Augmented Generation Integrating Web Search and Knowledge Graphs

SmartRAG: Jointly Learn RAG-Related Tasks From the Environment Feedback

Robust Implementation of Retrieval-Augmented Generation on Edge-based Computing-in-Memory Architectures

MBA-RAG: a Bandit Approach for Adaptive Retrieval-Augmented Generation through Question Complexity

DRAGIN: Dynamic Retrieval Augmented Generation Based on the Real-time Information Needs of Large Language Models.

ERATTA: Extreme RAG for Table To Answers with Large Language Models

DRAGIN: Dynamic Retrieval Augmented Generation based on the Information Needs of Large Language Models

DRAGIN: Dynamic Retrieval Augmented Generation based on the Real-time Information Needs of Large Language Models

CORAG: A Cost-Constrained Retrieval Optimization System for Retrieval-Augmented Generation

RAGGED: Towards Informed Design of Retrieval Augmented Generation Systems

Auto-RAG: Autonomous Retrieval-Augmented Generation for Large Language Models

Towards Understanding Retrieval Accuracy and Prompt Quality in RAG Systems