Abstract:Graph self-supervised learning (SSL) is now a go-to method for pre-training graph foundation models (GFMs). There is a wide variety of knowledge patterns embedded in the graph data, such as node properties and clusters, which are crucial to learning generalized representations for GFMs. However, existing surveys of GFMs have several shortcomings: they lack comprehensiveness regarding the most recent progress, have unclear categorization of self-supervised methods, and take a limited architecture-based perspective that is restricted to only certain types of graph models. As the ultimate goal of GFMs is to learn generalized graph knowledge, we provide a comprehensive survey of self-supervised GFMs from a novel knowledge-based perspective. We propose a knowledge-based taxonomy, which categorizes self-supervised graph models by the specific graph knowledge utilized. Our taxonomy consists of microscopic (nodes, links, etc.), mesoscopic (context, clusters, etc.), and macroscopic knowledge (global structure, manifolds, etc.). It covers a total of 9 knowledge categories and more than 25 pretext tasks for pre-training GFMs, as well as various downstream task generalization strategies. Such a knowledge-based taxonomy allows us to re-examine graph models based on new architectures more clearly, such as graph language models, as well as provide more in-depth insights for constructing GFMs.

What problem does this paper attempt to address?

The paper attempts to address several key shortcomings of Graph Foundation Models (GFMs) in the field of Self-Supervised Learning (SSL): 1. **Lack of comprehensiveness**: Existing reviews of graph foundation models fail to cover the latest advancements in this rapidly evolving field. For example, they do not discuss recent achievements in masked graph autoencoders and learning graph manifolds. 2. **Unclear classification**: Current reviews simply categorize graph self-supervised methods into three broad categories: "generative—predictive—contrastive." This rough classification fails to adequately capture the diverse knowledge patterns embedded in graph structures and attributes. For instance, link prediction requires understanding local relationships between nodes, while clustering prediction requires understanding the distribution of nodes across the entire graph. However, the existing classification system does not distinguish these differences. 3. **Limited to specific architectures**: Existing reviews of graph self-supervised learning are confined to Graph Neural Networks (GNNs) or overly emphasize language model architectures and their textual attributes, neglecting other structural patterns. Although some recent reviews attempt to divide the research into GNN, LLM, and GNN+LLM categories, they are still limited by specific backbone architectures rather than exploring from the perspective of mining general graph knowledge. To address the above issues, this paper proposes a new classification method based on a knowledge perspective, categorizing self-supervised graph models according to the specific graph knowledge they utilize, including micro (node and link features), meso (context and clustering), and macro knowledge (global structure and manifolds). This knowledge-based classification provides a unified analytical framework for GNNs and the latest graph language model pre-training and downstream task generalization strategies, offering valuable insights for the future development of graph foundation models.

A Survey on Self-Supervised Graph Foundation Models: Knowledge-Based Perspective

Self-supervised Learning on Graphs: Contrastive, Generative,or Predictive

Automated Graph Self-supervised Learning via Multi-teacher Knowledge Distillation

Homophily-Enhanced Self-Supervision for Graph Structure Learning: Insights and Directions.

Towards Graph Foundation Models: A Survey and Beyond

Self-Supervised Learning of Graph Neural Networks: A Unified Review

Graph-based Semi-supervised Learning: A Comprehensive Review

Knowledge-Aware Graph Self-Supervised Learning for Recommendation

Automated Self-Supervised Learning for Graphs

GraphFM: A Comprehensive Benchmark for Graph Foundation Model

Position: Graph Foundation Models are Already Here

Self-supervision meets kernel graph neural models: From architecture to augmentations

Do Neural Scaling Laws Exist on Graph Self-Supervised Learning?

Self-supervised Graph Representations with Generative Adversarial Learning

Influence of long-term diabetes on liver glycogen metabolism in the rat.

A Survey on Knowledge Graphs: Representation, Acquisition, and Applications

A Survey on Knowledge Graphs: Representation, Acquisition and Applications

Zero-shot and Few-shot Learning with Knowledge Graphs: A Comprehensive Survey

An Empirical Investigation of Commonsense Self-Supervision with Knowledge Graphs

Curriculum Graph Machine Learning: A Survey

Evaluating Self-Supervised Learning for Molecular Graph Embeddings