Large Language Models on Graphs: A Comprehensive Survey

Bowen Jin,Gang Liu,Chi Han,Meng Jiang,Heng Ji,Jiawei Han
2024-10-03
Abstract:Large language models (LLMs), such as GPT4 and LLaMA, are creating significant advancements in natural language processing, due to their strong text encoding/decoding ability and newly found emergent capability (e.g., reasoning). While LLMs are mainly designed to process pure texts, there are many real-world scenarios where text data is associated with rich structure information in the form of graphs (e.g., academic networks, and e-commerce networks) or scenarios where graph data is paired with rich textual information (e.g., molecules with descriptions). Besides, although LLMs have shown their pure text-based reasoning ability, it is underexplored whether such ability can be generalized to graphs (i.e., graph-based reasoning). In this paper, we provide a systematic review of scenarios and techniques related to large language models on graphs. We first summarize potential scenarios of adopting LLMs on graphs into three categories, namely pure graphs, text-attributed graphs, and text-paired graphs. We then discuss detailed techniques for utilizing LLMs on graphs, including LLM as Predictor, LLM as Encoder, and LLM as Aligner, and compare the advantages and disadvantages of different schools of models. Furthermore, we discuss the real-world applications of such methods and summarize open-source codes and benchmark datasets. Finally, we conclude with potential future research directions in this fast-growing field. The related source can be found at <a class="link-external link-https" href="https://github.com/PeterGriffinJin/Awesome-Language-Model-on-Graphs" rel="external noopener nofollow">this https URL</a>.
Computation and Language,Machine Learning
What problem does this paper attempt to address?
The paper attempts to address the issue of how to apply large language models (LLMs) to graph-structured data to solve graph-related reasoning problems. Specifically, the paper focuses on the following points: 1. **Combining pure text with graph-structured information**: While LLMs perform excellently in handling pure text, in many real-world scenarios, text data is often paired with rich structural information (such as graph structures in academic networks and e-commerce networks) or graph data paired with rich textual information (such as molecular descriptions). The paper explores how to leverage LLMs in such cases. 2. **Graph reasoning capability**: Although LLMs excel in pure text reasoning, their potential in graph reasoning has not been fully explored. The paper investigates whether LLMs can be used to solve basic reasoning problems on graph structures, such as connectivity, shortest path, subgraph matching, and logical rule induction. 3. **Technical classification**: The paper systematically summarizes three main techniques for applying LLMs to graph-structured data: - **LLM as a predictor**: LLMs are responsible for the final prediction task. - **LLM as an encoder**: LLMs are used to encode textual information, generating feature vectors for use by graph neural networks (GNNs). - **LLM as an aligner**: LLMs work together with GNNs, aligning text embeddings and graph embeddings through methods such as contrastive learning. 4. **Application scenarios**: The paper discusses the performance of these methods in practical applications, including social networks, academic networks, molecular graphs, and traffic networks, and summarizes open-source code and benchmark datasets. 5. **Future research directions**: The paper proposes possible future research directions, including how to further improve the performance of LLMs in graph reasoning tasks and how to better combine LLMs and GNNs. In summary, the paper aims to systematically review and analyze various techniques and application scenarios for applying large language models to graph-structured data, providing guidance and reference for further research in this field.