Abstract:GraphRAG addresses significant challenges in Retrieval-Augmented Generation (RAG) by leveraging graphs with embedded knowledge to enhance the reasoning capabilities of Large Language Models (LLMs). Despite its promising potential, the GraphRAG community currently lacks a unified framework for fine-grained decomposition of the graph-based knowledge retrieval process. Furthermore, there is no systematic categorization or evaluation of existing solutions within the retrieval process. In this paper, we present LEGO-GraphRAG, a modular framework that decomposes the retrieval process of GraphRAG into three interconnected modules: subgraph-extraction, path-filtering, and path-refinement. We systematically summarize and classify the algorithms and neural network (NN) models relevant to each module, providing a clearer understanding of the design space for GraphRAG instances. Additionally, we identify key design factors, such as Graph Coupling and Computational Cost, that influence the effectiveness of GraphRAG implementations. Through extensive empirical studies, we construct high-quality GraphRAG instances using a representative selection of solutions and analyze their impact on retrieval and reasoning performance. Our findings offer critical insights into optimizing GraphRAG instance design, ultimately contributing to the advancement of more accurate and contextually relevant LLM applications.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to solve several key challenges in the **GraphRAG** (Graph - based Retrieval - Augmented Generation) framework: 1. **Lack of a unified framework**: Currently, in the GraphRAG community, there is a lack of a unified framework to systematically classify and evaluate existing solutions (i.e., algorithms and neural network models). This leads to the inability to effectively summarize and classify existing GraphRAG work, and also hinders the clear identification of the actual effectiveness of specific solutions in the GraphRAG process. 2. **Insufficient modularity**: Current research often regards GraphRAG as a whole process without modular decomposition. This approach blurs the different contributions of each potential module to the overall performance. A more fine - grained modular GraphRAG framework will be helpful for analyzing the trade - offs between module performance and solution selection, and providing guidance for designing GraphRAG instances that meet the requirements of specific scenarios. ### Solutions To solve the above problems, the paper proposes a unified and modular research framework, named **LEGO - GraphRAG**, and establishes three key criteria: 1. **Modularization of GraphRAG**: LEGO - GraphRAG decomposes the process of retrieving "inference paths" into three interconnected and flexible modules: **Subgraph - Extraction**, **Path - Filtering**, and **Path - Refinement**. 2. **Solutions for GraphRAG**: LEGO - GraphRAG systematically summarizes and classifies the algorithms or neural network models available for each module, thus providing a clear understanding of the potential design space of GraphRAG instances. 3. **Design factors of GraphRAG**: LEGO - GraphRAG identifies two main factors that affect the design of GraphRAG instances, namely **Graph Coupling** and **Computational Cost**, and analyzes how these factors affect the available solutions for each module. ### Experiments and analysis Using the LEGO - GraphRAG framework, the author constructs some high - quality GraphRAG instances, combining the most representative algorithms or neural network models in various types of solutions, while ensuring comprehensive coverage of different solution types for each module. Through extensive empirical research, the author thoroughly analyzes the overall retrieval performance and LLM - based reasoning performance of these instances, and synthesizes the experimental results to identify several key insights in the development of GraphRAG instances from multiple analytical perspectives. ### Conclusions By proposing the LEGO - GraphRAG framework, the paper not only fills the gap in systematic classification and evaluation in the GraphRAG field, but also provides a theoretical basis and practical guidance for designing more efficient and accurate GraphRAG instances, ultimately promoting the development of large - language models in a wider range of application scenarios.

LEGO-GraphRAG: Modularizing Graph-based Retrieval-Augmented Generation for Design Space Exploration

GRAG: Graph Retrieval-Augmented Generation

Simple is Effective: The Roles of Graphs and Large Language Models in Knowledge-Graph-Based Retrieval-Augmented Generation

Graph Retrieval-Augmented Generation: A Survey

Modular RAG: Transforming RAG Systems into LEGO-like Reconfigurable Frameworks

RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation

LightRAG: Simple and Fast Retrieval-Augmented Generation

RAGraph: A General Retrieval-Augmented Graph Learning Framework

Advanced RAG Models with Graph Structures: Optimizing Complex Knowledge Reasoning and Text Generation

Think-on-Graph 2.0: Deep and Faithful Large Language Model Reasoning with Knowledge-guided Retrieval Augmented Generation

SimGRAG: Leveraging Similar Subgraphs for Knowledge Graphs Driven Retrieval-Augmented Generation

GNN-RAG: Graph Neural Retrieval for Large Language Model Reasoning

Don't Forget to Connect! Improving RAG with Graph-based Reranking

DynaGRAG: Improving Language Understanding and Generation through Dynamic Subgraph Representation in Graph Retrieval-Augmented Generation

RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement

Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA

Adapting to Non-Stationary Environments: Multi-Armed Bandit Enhanced Retrieval-Augmented Generation on Knowledge Graphs

G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering

Retrieval Augmented Generation for Dynamic Graph Modeling

Empowering Large Language Models to Set up a Knowledge Retrieval Indexer via Self-Learning

Reasoning Graph Enhanced Exemplars Retrieval for In-Context Learning