Abstract:Graph foundation models (GFMs) have recently gained significant attention. However, the unique data processing and evaluation setups employed by different studies hinder a deeper understanding of their progress. Additionally, current research tends to focus on specific subsets of graph learning tasks, such as structural tasks, node-level tasks, or classification tasks. As a result, they often incorporate specialized modules tailored to particular task types, losing their applicability to other graph learning tasks and contradicting the original intent of foundation models to be universal. Therefore, to enhance consistency, coverage, and diversity across domains, tasks, and research interests within the graph learning community in the evaluation of GFMs, we propose GFMBench-a systematic and comprehensive benchmark comprising 26 datasets. Moreover, we introduce LangGFM, a novel GFM that relies entirely on large language models. By revisiting and exploring the effective graph textualization principles, as well as repurposing successful techniques from graph augmentation and graph self-supervised learning within the language space, LangGFM achieves performance on par with or exceeding the state of the art across GFMBench, which can offer us new perspectives, experiences, and baselines to drive forward the evolution of GFMs.

What problem does this paper attempt to address?

This paper attempts to solve the following two main problems: 1. **Unified evaluation criteria and cross - task applicability**: - In current research on Graph Foundation Models (GFMs), different studies adopt different data processing and evaluation settings, which hinders in - depth understanding of the progress of these models. Specifically, different datasets, label rates, and evaluation metrics make it difficult to compare different models. - In addition, existing GFMs usually focus on specific subsets of graph learning tasks (such as structural tasks, node - level tasks, or classification tasks) and introduce specialized modules for this purpose, resulting in their poor performance in other types of graph learning tasks, which goes against the original intention that foundation models should be general - purpose. 2. **Explore the potential of large - language models in graph learning**: - The paper proposes a brand - new method, that is, relying entirely on large - language models (LLMs) to build powerful graph foundation models. By re - examining and exploring effective graph - text conversion principles and applying successful graph - augmentation and graph - self - supervised - learning techniques to the language space, LangGFM can achieve performance comparable to or even better than that of the existing state - of - the - art models on multiple graph - learning tasks. - This method not only simplifies the model design but also provides new perspectives and experiences, laying the foundation for promoting the development of GFMs. ### Solution To address the above challenges, the authors make two main contributions: 1. **GFMBench Benchmark**: - A systematic and comprehensive benchmark platform is constructed, containing 26 datasets, aiming to ensure the consistency of the evaluation pipeline, the diversity of the graph domain, and the coverage of a wide range of tasks and research interests. This helps to improve the comparability and generality of GFMs. 2. **LangGFM Model**: - A brand - new graph - foundation model LangGFM based on LLMs is proposed. This model converts graph data into natural - language descriptions and directly fine - tunes the LLM to perform graph tasks, thus avoiding the need for specialized modules such as traditional graph neural networks (GNNs). - Extensive evaluation of LangGFM on GFMBench shows that it performs well on various tasks, demonstrating the great potential of LLMs in graph learning. ### Formula Presentation Some of the key formulas involved in the paper are as follows: - Mapping function for graph machine - learning problems: \[ f_{\pi_i}: G_i \rightarrow Y_i, \quad f_{\pi_i}(G_j^i) = Y_j^i \] where \( G_j^i \in G_i \) and \( Y_j^i \in Y_i \) represent the \( j\) - th graph and its corresponding label respectively. - Single - function representation of an ideal graph - foundation model: \[ f_{GFM} : \bigcup_{\pi_i \in \Pi} G_i \rightarrow \bigcup_{\pi_i \in \Pi} Y_i, \quad f_{GFM}(G_j^i) = Y_j^i \] Through these improvements, the paper provides new directions and benchmarks for future research and development of GFMs.

LangGFM: A Large Language Model Alone Can be a Powerful Graph Foundation Model

Towards Graph Foundation Models: A Survey and Beyond

GOFA: A Generative One-For-All Model for Joint Graph Language Modeling

Position: Graph Foundation Models are Already Here

GLBench: A Comprehensive Benchmark for Graph with Large Language Models

GPT4Graph: Can Large Language Models Understand Graph Structured Data ? an Empirical Evaluation and Benchmarking.

Large Language Models on Graphs: A Comprehensive Survey

A Survey of Large Language Models for Graphs

GraphEval2000: Benchmarking and Improving Large Language Models on Graph Datasets

Text-space Graph Foundation Models: Comprehensive Benchmarks and New Insights

Enhance Graph Alignment for Large Language Models

GraphGPT: Graph Instruction Tuning for Large Language Models

A Survey of Graph Meets Large Language Model: Progress and Future Directions

Bridging Large Language Models and Graph Structure Learning Models for Robust Representation Learning

GFT: Graph Foundation Model with Transferable Tree Vocabulary

Large Generative Graph Models

Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models

Evaluating Large Language Models on Graphs: Performance Insights and Comparative Analysis

Exploring the Potential of Large Language Models in Graph Generation

Graph-Aware Language Model Pre-Training on a Large Graph Corpus Can Help Multiple Graph Applications