Quantifying the Knowledge in GNNs for Reliable Distillation into MLPs

Lirong Wu,Haitao Lin,Yufei Huang,Stan Z. Li
DOI: https://doi.org/10.48550/arXiv.2306.05628
2023-06-09
Abstract:To bridge the gaps between topology-aware Graph Neural Networks (GNNs) and inference-efficient Multi-Layer Perceptron (MLPs), GLNN proposes to distill knowledge from a well-trained teacher GNN into a student MLP. Despite their great progress, comparatively little work has been done to explore the reliability of different knowledge points (nodes) in GNNs, especially their roles played during distillation. In this paper, we first quantify the knowledge reliability in GNN by measuring the invariance of their information entropy to noise perturbations, from which we observe that different knowledge points (1) show different distillation speeds (temporally); (2) are differentially distributed in the graph (spatially). To achieve reliable distillation, we propose an effective approach, namely Knowledge-inspired Reliable Distillation (KRD), that models the probability of each node being an informative and reliable knowledge point, based on which we sample a set of additional reliable knowledge points as supervision for training student MLPs. Extensive experiments show that KRD improves over the vanilla MLPs by 12.62% and outperforms its corresponding teacher GNNs by 2.16% averaged over 7 datasets and 3 GNN architectures.
Machine Learning
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve several key problems in the knowledge distillation process from graph neural networks (GNNs) to multi - layer perceptrons (MLPs): 1. **Reliability problem**: Existing GNN - to - MLP knowledge distillation methods usually assume that all knowledge points (nodes) have the same importance in GNNs, ignoring the reliability of different knowledge points and their different roles in the distillation process. This may lead to the distilled MLP model being under - confident in prediction, that is, the MLP cannot predict as confidently as the teacher GNN. 2. **Knowledge point distribution differences**: There are spatio - temporal differences in the distribution of different knowledge points in the graph, that is, different knowledge points show different speeds and distribution patterns in the distillation process. Specifically: - **Temporal distribution**: Different knowledge points have different distillation speeds. - **Spatial distribution**: Reliable knowledge points tend to be distributed near the class centers, while unreliable nodes are distributed near the class boundaries. 3. **Insufficient supervision problem**: Due to the lack of reliable supervision from the teacher GNN, the student MLP may not be able to fully learn high - quality knowledge, resulting in poor performance. To solve these problems, the authors propose a new framework - Knowledge - inspired Reliable Distillation (KRD). KRD quantifies the knowledge reliability of each node in GNNs and selects reliable nodes as additional supervision signals to train the student MLP, thereby improving the distillation effect. ### Main contributions - **Identify and describe the potential under - confident problem**: Explain in detail the manifestation, cause and its impact on MLP performance of this problem. - **Propose a perturbation - invariance - based measurement method**: Used to quantify the reliability of different knowledge points in GNNs and analyze their spatio - temporal roles in the distillation process. - **Propose the KRD framework**: Utilize the quantified GNN knowledge to screen out reliable nodes as additional supervision signals to train MLP more effectively. Through these improvements, KRD not only significantly improves the performance of MLP, but also outperforms existing methods on multiple datasets and GNN architectures.