Graph Learning in the Era of LLMs: A Survey from the Perspective of Data, Models, and Tasks

Xunkai Li,Zhengyu Wu,Jiayi Wu,Hanwen Cui,Jishuo Jia,Rong-Hua Li,Guoren Wang
2024-12-17
Abstract:With the increasing prevalence of cross-domain Text-Attributed Graph (TAG) Data (e.g., citation networks, recommendation systems, social networks, and ai4science), the integration of Graph Neural Networks (GNNs) and Large Language Models (LLMs) into a unified Model architecture (e.g., LLM as enhancer, LLM as collaborators, LLM as predictor) has emerged as a promising technological paradigm. The core of this new graph learning paradigm lies in the synergistic combination of GNNs' ability to capture complex structural relationships and LLMs' proficiency in understanding informative contexts from the rich textual descriptions of graphs. Therefore, we can leverage graph description texts with rich semantic context to fundamentally enhance Data quality, thereby improving the representational capacity of model-centric approaches in line with data-centric machine learning principles. By leveraging the strengths of these distinct neural network architectures, this integrated approach addresses a wide range of TAG-based Task (e.g., graph learning, graph reasoning, and graph question answering), particularly in complex industrial scenarios (e.g., supervised, few-shot, and zero-shot settings). In other words, we can treat text as a medium to enable cross-domain generalization of graph learning Model, allowing a single graph model to effectively handle the diversity of downstream graph-based Task across different data domains. This work serves as a foundational reference for researchers and practitioners looking to advance graph learning methodologies in the rapidly evolving landscape of LLM. We consistently maintain the related open-source materials at \url{<a class="link-external link-https" href="https://github.com/xkLi-Allen/Awesome-GNN-in-LLMs-Papers" rel="external noopener nofollow">this https URL</a>}.
Machine Learning,Artificial Intelligence,Computation and Language,Databases
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to combine Graph Neural Networks (GNNs) with large - scale language models (LLMs) in the era of large - scale language models (LLM) to deal with the challenges of Text - Attributed Graph (TAG) data processing in complex fields. Specifically, the paper focuses on the following points: 1. **Improving data representation ability**: Traditional methods are difficult to fully capture the complex structural patterns in graph data, especially when dealing with large - scale graphs. The relationships between nodes not only depend on the context but also show significant differences in different fields. By combining LLMs and GNNs, the powerful natural language understanding and generation capabilities of LLMs can be used to extract text features and integrate them into the graph learning process, thereby enhancing the data representation ability of the graph model. 2. **Improving model generalization ability**: Traditional graph learning techniques have the problem of insufficient generalization ability in cross - domain applications. By integrating LLMs as enhancers, collaborators or predictors into the GNN architecture, the model can better perform cross - domain generalization, reduce the need for retraining in specific domains, and improve the overall flexibility and application range. 3. **Enhancing task reasoning ability**: With the complication of industrial application scenarios, models are required to be able to effectively reason across multiple large - scale data sources. LLMs have excellent capabilities in understanding context and can provide strong support for few - shot and zero - shot reasoning. Combined with the structural learning ability of GNNs, this integrated method enables the model to show stronger reasoning ability and the ability to adapt to new tasks in various graph learning tasks from node classification to link prediction. 4. **Promoting the development of graph learning methods**: By systematically summarizing the research progress around the three pillars of data, model and task, revealing the current challenges and proposing future research directions, it aims to provide a basic reference for researchers and practitioners to promote graph learning methods. In general, the core objective of this paper is to explore how to achieve breakthroughs in data representation, model generalization and task reasoning by integrating LLMs and GNNs, and then promote the further development of the graph learning field.