Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs

Bowen Jin,Chulin Xie,Jiawei Zhang,Kashob Kumar Roy,Yu Zhang,Zheng Li,Ruirui Li,Xianfeng Tang,Suhang Wang,Yu Meng,Jiawei Han
2024-10-03
Abstract:Large language models (LLMs), while exhibiting exceptional performance, suffer from hallucinations, especially on knowledge-intensive tasks. Existing works propose to augment LLMs with individual text units retrieved from external knowledge corpora to alleviate the issue. However, in many domains, texts are interconnected (e.g., academic papers in a bibliographic graph are linked by citations and co-authorships) which form a (text-attributed) graph. The knowledge in such graphs is encoded not only in single texts/nodes but also in their associated connections. To facilitate the research of augmenting LLMs with graphs, we manually construct a Graph Reasoning Benchmark dataset called GRBench, containing 1,740 questions that can be answered with the knowledge from 10 domain graphs. Then, we propose a simple and effective framework called Graph Chain-of-thought (Graph-CoT) to augment LLMs with graphs by encouraging LLMs to reason on the graph iteratively. Each Graph-CoT iteration consists of three sub-steps: LLM reasoning, LLM-graph interaction, and graph execution. We conduct systematic experiments with three LLM backbones on GRBench, where Graph-CoT outperforms the baselines consistently. The code is available at <a class="link-external link-https" href="https://github.com/PeterGriffinJin/Graph-CoT" rel="external noopener nofollow">this https URL</a>.
Computation and Language,Information Retrieval,Machine Learning
What problem does this paper attempt to address?
The paper is primarily dedicated to addressing the hallucination problem in large language models (LLMs) during knowledge-intensive tasks. Specifically, although existing LLMs perform excellently, they often generate incorrect information or conclusions when handling tasks that require precise factual information. To solve this issue, the paper proposes two main contributions: 1. **Constructing a benchmark dataset GRB ENCH**: This dataset includes ten real-world graph datasets from five different domains (academia, e-commerce, literature, healthcare, and law), aiming to evaluate how effectively LLMs can interact with domain-specific graph data to solve problems. 2. **Proposing a framework called Graph Chain-of-Thought (GRAPH-COT)**: This is an iterative framework that allows LLMs to gradually traverse the graph structure to find the required key information, rather than directly inputting the entire subgraph as context to the LLMs. The framework includes three sub-steps: reasoning (LLMs determine what conclusions can be drawn from the current information and what additional information is needed), interaction (LLMs generate the interactions needed to obtain information from the graph), and execution (executing the requests from the interaction step on the graph and returning the corresponding information). Through these methods, the researchers hope to enhance the ability of LLMs to handle complex graph structures, thereby improving their performance on knowledge-intensive tasks. Experimental results show that compared to traditional retrieval-based methods, GRAPH-COT performs better in multiple benchmark tests, especially on problems requiring multi-step reasoning. However, despite achieving good results, there is still room for improvement in handling some complex or multi-hop reasoning problems.