Abstract:To bridge the gaps between topology-aware Graph Neural Networks (GNNs) and inference-efficient Multi-Layer Perceptron (MLPs), GLNN proposes to distill knowledge from a well-trained teacher GNN into a student MLP. Despite their great progress, comparatively little work has been done to explore the reliability of different knowledge points (nodes) in GNNs, especially their roles played during distillation. In this paper, we first quantify the knowledge reliability in GNN by measuring the invariance of their information entropy to noise perturbations, from which we observe that different knowledge points (1) show different distillation speeds (temporally); (2) are differentially distributed in the graph (spatially). To achieve reliable distillation, we propose an effective approach, namely Knowledge-inspired Reliable Distillation (KRD), that models the probability of each node being an informative and reliable knowledge point, based on which we sample a set of additional reliable knowledge points as supervision for training student MLPs. Extensive experiments show that KRD improves over the vanilla MLPs by 12.62% and outperforms its corresponding teacher GNNs by 2.16% averaged over 7 datasets and 3 GNN architectures.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to solve several key problems in the knowledge distillation process from graph neural networks (GNNs) to multi - layer perceptrons (MLPs): 1. **Reliability problem**: Existing GNN - to - MLP knowledge distillation methods usually assume that all knowledge points (nodes) have the same importance in GNNs, ignoring the reliability of different knowledge points and their different roles in the distillation process. This may lead to the distilled MLP model being under - confident in prediction, that is, the MLP cannot predict as confidently as the teacher GNN. 2. **Knowledge point distribution differences**: There are spatio - temporal differences in the distribution of different knowledge points in the graph, that is, different knowledge points show different speeds and distribution patterns in the distillation process. Specifically: - **Temporal distribution**: Different knowledge points have different distillation speeds. - **Spatial distribution**: Reliable knowledge points tend to be distributed near the class centers, while unreliable nodes are distributed near the class boundaries. 3. **Insufficient supervision problem**: Due to the lack of reliable supervision from the teacher GNN, the student MLP may not be able to fully learn high - quality knowledge, resulting in poor performance. To solve these problems, the authors propose a new framework - Knowledge - inspired Reliable Distillation (KRD). KRD quantifies the knowledge reliability of each node in GNNs and selects reliable nodes as additional supervision signals to train the student MLP, thereby improving the distillation effect. ### Main contributions - **Identify and describe the potential under - confident problem**: Explain in detail the manifestation, cause and its impact on MLP performance of this problem. - **Propose a perturbation - invariance - based measurement method**: Used to quantify the reliability of different knowledge points in GNNs and analyze their spatio - temporal roles in the distillation process. - **Propose the KRD framework**: Utilize the quantified GNN knowledge to screen out reliable nodes as additional supervision signals to train MLP more effectively. Through these improvements, KRD not only significantly improves the performance of MLP, but also outperforms existing methods on multiple datasets and GNN architectures.

Quantifying the Knowledge in GNNs for Reliable Distillation into MLPs

Extracting Low-/High- Frequency Knowledge from Graph Neural Networks and Injecting it into MLPs: An Effective GNN-to-MLP Distillation Framework

Teach Harder, Learn Poorer: Rethinking Hard Sample Distillation for GNN-to-MLP Knowledge Distillation

Knowledge Distillation Improves Graph Structure Augmentation for Graph Neural Networks

Teaching MLP More Graph Information: A Three-stage Multitask Knowledge Distillation Framework

Decoupled graph knowledge distillation: A general logits-based method for learning MLPs on graphs

On Self-Distilling Graph Neural Network

Quantifying the Knowledge in a DNN to Explain Knowledge Distillation for Classification

Graph-less Neural Networks: Teaching Old MLPs New Tricks via Distillation

A Teacher-Free Graph Knowledge Distillation Framework with Dual Self-Distillation

On Representation Knowledge Distillation for Graph Neural Networks

Graph Knowledge Distillation to Mixture of Experts

Edge-free but Structure-aware: Prototype-Guided Knowledge Distillation from GNNs to MLPs

Online Adversarial Knowledge Distillation for Graph Neural Networks

Distilling Knowledge from Graph Convolutional Networks

Teaching MLPs to Master Heterogeneous Graph-Structured Knowledge for Efficient and Accurate Inference

Enhanced Scalable Graph Neural Network via Knowledge Distillation

Distilling Holistic Knowledge with Graph Neural Networks

AdaGMLP: AdaBoosting GNN-to-MLP Knowledge Distillation

Gynaecological nursing: a compromising situation.

EGNN: Constructing explainable graph neural networks via knowledge distillation