Abstract:GNN-to-MLP distillation aims to utilize knowledge distillation (KD) to learn computationally-efficient multi-layer perceptron (student MLP) on graph data by mimicking the output representations of teacher GNN. Existing methods mainly make the MLP to mimic the GNN predictions over a few class labels. However, the class space may not be expressive enough for covering numerous diverse local graph structures, thus limiting the performance of knowledge transfer from GNN to MLP. To address this issue, we propose to learn a new powerful graph representation space by directly labeling nodes' diverse local structures for GNN-to-MLP distillation. Specifically, we propose a variant of VQ-VAE to learn a structure-aware tokenizer on graph data that can encode each node's local substructure as a discrete code. The discrete codes constitute a codebook as a new graph representation space that is able to identify different local graph structures of nodes with the corresponding code indices. Then, based on the learned codebook, we propose a new distillation target, namely soft code assignments, to directly transfer the structural knowledge of each node from GNN to MLP. The resulting framework VQGraph achieves new state-of-the-art performance on GNN-to-MLP distillation in both transductive and inductive settings across seven graph datasets. We show that VQGraph with better performance infers faster than GNNs by 828x, and also achieves accuracy improvement over GNNs and stand-alone MLPs by 3.90% and 28.05% on average, respectively. Code: <a class="link-external link-https" href="https://github.com/YangLing0818/VQGraph" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the limitation in the representation space of the existing knowledge distillation methods from graph neural networks (GNN) to multi - layer perceptrons (MLP). Specifically, the existing methods mainly learn the graph representation space of GNN through a small number of class labels, which may not be sufficient to cover the many different local graph structures of nodes, thus limiting the knowledge transfer performance from GNN to MLP. To solve this problem, the paper proposes a new and powerful graph representation space to achieve knowledge distillation from GNN to MLP by directly labeling the diverse local structures of nodes. ### Specific Problem Description 1. **Limitations of Existing Methods**: - The existing knowledge distillation methods from GNN to MLP mainly rely on a few class labels to learn the graph representation space. - These class labels may not be sufficient to express the multiple local graph structures of nodes, resulting in limited performance of knowledge transfer. 2. **Objectives**: - Propose a new graph representation space that can more effectively capture the local graph structures of nodes. - Through this new graph representation space, achieve effective knowledge distillation from GNN to MLP and improve the performance of MLP on graph data. ### Solutions 1. **Graph Tokenizer**: - Use a variant of the vector - quantized variational auto - encoder (VQ - VAE) to learn a structure - aware graph encoder. - This encoder can encode the local sub - structure of each node into discrete codes, and these codes form a codebook as a new graph representation space. 2. **Soft Code Assignments**: - Based on the learned codebook, propose a new distillation target, namely soft code assignments, to directly transfer from GNN to MLP. - By maximizing the consistency of the soft code assignments of GNN and MLP on discrete codes, achieve structure - aware knowledge distillation. ### Experimental Results - **Performance Improvement**: - VQG RAPH achieves a higher average accuracy than GNN on multiple graph datasets, with an improvement of 3.90%. - Compared with the existing state - of - the - art GNN - to - MLP distillation method NOSMOG, VQG RAPH improves the average accuracy by 1.39% on multiple datasets. - VQG RAPH also significantly outperforms the individual MLP model, with an average improvement of 28.05% in accuracy. - **Inference Speed**: - The inference speed of VQG RAPH is 828 times faster than that of GNN. ### Conclusion By introducing a new graph representation space and a structure - aware distillation target, the paper successfully solves the limitations of the existing knowledge distillation methods from GNN to MLP and achieves more efficient and accurate graph data processing.

VQGraph: Rethinking Graph Representation Space for Bridging GNNs and MLPs

Knowledge Distillation Improves Graph Structure Augmentation for Graph Neural Networks

Discovering the Representation Bottleneck of Graph Neural Networks from Multi-order Interactions

A Generalization of ViT/MLP-Mixer to Graphs

Teaching MLP More Graph Information: A Three-stage Multitask Knowledge Distillation Framework

SA-MLP: Distilling Graph Knowledge from GNNs into Structure-Aware MLP

Edge-free but Structure-aware: Prototype-Guided Knowledge Distillation from GNNs to MLPs

Decoupled graph knowledge distillation: A general logits-based method for learning MLPs on graphs

Graph-less Neural Networks: Teaching Old MLPs New Tricks via Distillation

Teaching MLPs to Master Heterogeneous Graph-Structured Knowledge for Efficient and Accurate Inference

Utilizing Edge Features in Graph Neural Networks Via Variational Information Maximization

Teaching Yourself: Graph Self-Distillation on Neighborhood for Node Classification

A Teacher-Free Graph Knowledge Distillation Framework with Dual Self-Distillation

AdaGMLP: AdaBoosting GNN-to-MLP Knowledge Distillation

Extracting Low-/High- Frequency Knowledge from Graph Neural Networks and Injecting it into MLPs: An Effective GNN-to-MLP Distillation Framework

Robust Node Representation Learning via Graph Variational Diffusion Networks

Quantifying the Knowledge in GNNs for Reliable Distillation into MLPs

Enhanced Scalable Graph Neural Network via Knowledge Distillation

On Representation Knowledge Distillation for Graph Neural Networks

From Stars to Subgraphs: Uplifting Any GNN with Local Structure Awareness

Graph-MLP: Node Classification without Message Passing in Graph