LangGFM: A Large Language Model Alone Can be a Powerful Graph Foundation Model

Tianqianjin Lin,Pengwei Yan,Kaisong Song,Zhuoren Jiang,Yangyang Kang,Jun Lin,Weikang Yuan,Junjie Cao,Changlong Sun,Xiaozhong Liu
2024-10-19
Abstract:Graph foundation models (GFMs) have recently gained significant attention. However, the unique data processing and evaluation setups employed by different studies hinder a deeper understanding of their progress. Additionally, current research tends to focus on specific subsets of graph learning tasks, such as structural tasks, node-level tasks, or classification tasks. As a result, they often incorporate specialized modules tailored to particular task types, losing their applicability to other graph learning tasks and contradicting the original intent of foundation models to be universal. Therefore, to enhance consistency, coverage, and diversity across domains, tasks, and research interests within the graph learning community in the evaluation of GFMs, we propose GFMBench-a systematic and comprehensive benchmark comprising 26 datasets. Moreover, we introduce LangGFM, a novel GFM that relies entirely on large language models. By revisiting and exploring the effective graph textualization principles, as well as repurposing successful techniques from graph augmentation and graph self-supervised learning within the language space, LangGFM achieves performance on par with or exceeding the state of the art across GFMBench, which can offer us new perspectives, experiences, and baselines to drive forward the evolution of GFMs.
Machine Learning,Artificial Intelligence,Social and Information Networks
What problem does this paper attempt to address?
This paper attempts to solve the following two main problems: 1. **Unified evaluation criteria and cross - task applicability**: - In current research on Graph Foundation Models (GFMs), different studies adopt different data processing and evaluation settings, which hinders in - depth understanding of the progress of these models. Specifically, different datasets, label rates, and evaluation metrics make it difficult to compare different models. - In addition, existing GFMs usually focus on specific subsets of graph learning tasks (such as structural tasks, node - level tasks, or classification tasks) and introduce specialized modules for this purpose, resulting in their poor performance in other types of graph learning tasks, which goes against the original intention that foundation models should be general - purpose. 2. **Explore the potential of large - language models in graph learning**: - The paper proposes a brand - new method, that is, relying entirely on large - language models (LLMs) to build powerful graph foundation models. By re - examining and exploring effective graph - text conversion principles and applying successful graph - augmentation and graph - self - supervised - learning techniques to the language space, LangGFM can achieve performance comparable to or even better than that of the existing state - of - the - art models on multiple graph - learning tasks. - This method not only simplifies the model design but also provides new perspectives and experiences, laying the foundation for promoting the development of GFMs. ### Solution To address the above challenges, the authors make two main contributions: 1. **GFMBench Benchmark**: - A systematic and comprehensive benchmark platform is constructed, containing 26 datasets, aiming to ensure the consistency of the evaluation pipeline, the diversity of the graph domain, and the coverage of a wide range of tasks and research interests. This helps to improve the comparability and generality of GFMs. 2. **LangGFM Model**: - A brand - new graph - foundation model LangGFM based on LLMs is proposed. This model converts graph data into natural - language descriptions and directly fine - tunes the LLM to perform graph tasks, thus avoiding the need for specialized modules such as traditional graph neural networks (GNNs). - Extensive evaluation of LangGFM on GFMBench shows that it performs well on various tasks, demonstrating the great potential of LLMs in graph learning. ### Formula Presentation Some of the key formulas involved in the paper are as follows: - Mapping function for graph machine - learning problems: \[ f_{\pi_i}: G_i \rightarrow Y_i, \quad f_{\pi_i}(G_j^i) = Y_j^i \] where \( G_j^i \in G_i \) and \( Y_j^i \in Y_i \) represent the \( j\) - th graph and its corresponding label respectively. - Single - function representation of an ideal graph - foundation model: \[ f_{GFM} : \bigcup_{\pi_i \in \Pi} G_i \rightarrow \bigcup_{\pi_i \in \Pi} Y_i, \quad f_{GFM}(G_j^i) = Y_j^i \] Through these improvements, the paper provides new directions and benchmarks for future research and development of GFMs.