Abstract:In terms of accuracy, Graph Neural Networks (GNNs) are the best architectural choice for the node classification task. Their drawback in real-world deployment is the latency that emerges from the neighbourhood processing operation. One solution to the latency issue is to perform knowledge distillation from a trained GNN to a Multi-Layer Perceptron (MLP), where the MLP processes only the features of the node being classified (and possibly some pre-computed structural information). However, the performance of such MLPs in both transductive and inductive settings remains inconsistent for existing knowledge distillation techniques. We propose to address the performance concerns by using a specially-designed student model instead of an MLP. Our model, named Routing-by-Memory (RbM), is a form of Mixture-of-Experts (MoE), with a design that enforces expert specialization. By encouraging each expert to specialize on a certain region on the hidden representation space, we demonstrate experimentally that it is possible to derive considerably more consistent performance across multiple datasets.

What problem does this paper attempt to address?

The main aim of this paper is to address the latency issues faced by Graph Neural Networks (GNNs) in real-world deployments, especially when handling large-scale graph data. Specifically, the objectives of the paper can be summarized as follows: 1. **Addressing the latency issue of GNNs**: Although GNNs perform excellently on node classification tasks, they require processing the neighbor information of nodes at each layer to compute the prediction results. This leads to high computational complexity and long inference times. Particularly in large graphs, this neighborhood processing operation results in resource-intensive operations. 2. **Improving knowledge distillation techniques**: To address the above issue, previous works have attempted to transfer the knowledge of GNNs to Multi-Layer Perceptrons (MLPs) through knowledge distillation. This leverages the efficiency and scalability advantages of MLPs. However, existing knowledge distillation techniques show inconsistent performance under different settings (such as inductive and transductive settings), especially for large graph datasets. 3. **Proposing a new student model**: The paper proposes a new architecture called Routing-by-Memory (RbM) as the student model to address the above issues. RbM is a variant of the Mixture-of-Experts (MoE) model, which encourages each expert to specialize in specific regions of the network's hidden representation space, thereby achieving more consistent performance improvements. 4. **Enhancing the performance consistency of the student model**: By using the RbM model instead of traditional MLPs, the paper demonstrates that it is possible to significantly enhance the performance consistency of the student model even with a fixed number of parameters. Furthermore, through a series of experiments, the paper proves that the proposed RbM method effectively improves performance across datasets of different sizes and outperforms existing baseline models. In summary, the main objective of the paper is to improve existing knowledge distillation techniques by introducing a new student model—RbM. This aims to more effectively transfer knowledge from GNNs to MLPs, thereby addressing the latency issues of GNNs in practical applications and enhancing the performance consistency of the student model under different settings.

Graph Knowledge Distillation to Mixture of Experts

Knowledge Distillation Improves Graph Structure Augmentation for Graph Neural Networks

Learning Structure Perception MLPs on Graphs: a Layer-Wise Graph Knowledge Distillation Framework

Multi-Scale Distillation from Multiple Graph Neural Networks

Graph Mixture of Experts: Learning on Large-Scale Graphs with Explicit Diversity Modeling

Mixture of Weak & Strong Experts on Graphs

Quantifying the Knowledge in GNNs for Reliable Distillation into MLPs

Decoupled graph knowledge distillation: A general logits-based method for learning MLPs on graphs

Online Adversarial Knowledge Distillation for Graph Neural Networks

Mixture of Experts Meets Decoupled Message Passing: Towards General and Adaptive Node Classification

Edge-free but Structure-aware: Prototype-Guided Knowledge Distillation from GNNs to MLPs

Extract the Knowledge of Graph Neural Networks and Go Beyond it: An Effective Knowledge Distillation Framework

Online Adversarial Distillation for Graph Neural Networks

Adaptive Hierarchical Knowledge Distillation from GNNs to MLPs

Frameless Graph Knowledge Distillation

Compressing Deep Graph Neural Networks via Adversarial Knowledge Distillation

A Teacher-Free Graph Knowledge Distillation Framework with Dual Self-Distillation

Teaching MLP More Graph Information: A Three-stage Multitask Knowledge Distillation Framework

On Representation Knowledge Distillation for Graph Neural Networks

Knowledge Distillation Via Adaptive Meta-Learning for Graph Neural Network

Shared Growth of Graph Neural Networks via Free-direction Knowledge Distillation