Scalable Multi-Source Pre-training for Graph Neural Networks

Mingkai Lin,Wenzhong Li,Xiaobin Hong,Sanglu Lu
DOI: https://doi.org/10.1145/3664647.3680924
2024-01-01
Abstract:Graph Neural Networks (GNNs) have proven effective in various scenarios. A key strategy involves pre-training existing graphs to extract knowledge that can be transferred to improve performance on downstream tasks, reducing the need for extensive labeled data. However, previous works commonly assumed that pre-training and fine-tuning occur in the same or closely related domains. A limitation is that for each individual graph without accessible pre-training data, a GNN must be trained from scratch, imposing high training overhead and hindering the ability of generalization. In this paper, we address the GNN multi-domain pre-training problem, which intends to pre-train a transferable GNN model from heterogeneous multi-source graph domains and then apply it in an unseen one with minor fine-tuning costs. To this end, we propose a scaLA ble Multi-source Pre-training (LAMP) method. For pre-training, LAMP presents a graph dual-distillation approach to distill massive knowledge from various graph domains to form synthetic homogeneous graphs. Simultaneously, high-level meta-knowledge from the synthetic graphs is extracted to train the GNN model, whose capability can be adjusted according to target graph contexts through a co-training modulation architecture. For fine-tuning, LAMP respectively aligns the target graph distribution, graph context, and graph task with the pretext so that the downstream task in the unseen domain can be reshaped to leverage the transferable knowledge efficiently. Extensive experiments on four different graph domain datasets show the superiority of LAMP.
What problem does this paper attempt to address?