Abstract:While Graph Neural Networks (GNNs) have demonstrated their effectiveness in processing non-Euclidean structured data, the neighborhood fetching of GNNs is time-consuming and computationally intensive, making them difficult to deploy in low-latency industrial applications. To address the issue, a feasible solution is graph knowledge distillation (KD), which can learn high-performance student Multi-layer Perceptrons (MLPs) to replace GNNs by mimicking the superior output of teacher GNNs. However, state-of-the-art graph knowledge distillation methods are mainly based on distilling deep features from intermediate hidden layers, this leads to the significance of logit layer distillation being greatly overlooked. To provide a novel viewpoint for studying logits-based KD methods, we introduce the idea of decoupling into graph knowledge distillation. Specifically, we first reformulate the classical graph knowledge distillation loss into two parts, i.e., the target class graph distillation (TCGD) loss and the non-target class graph distillation (NCGD) loss. Next, we decouple the negative correlation between GNN's prediction confidence and NCGD loss, as well as eliminate the fixed weight between TCGD and NCGD. We named this logits-based method Decoupled Graph Knowledge Distillation (DGKD). It can flexibly adjust the weights of TCGD and NCGD for different data samples, thereby improving the prediction accuracy of the student MLP. Extensive experiments conducted on public benchmark datasets show the effectiveness of our method. Additionally, DGKD can be incorporated into any existing graph knowledge distillation framework as a plug-and-play loss function, further improving distillation performance. The code is available at https://github.com/xsk160/DGKD.

Compressing deep graph convolution network with multi-staged knowledge distillation

GKD: Semi-supervised Graph Knowledge Distillation for Graph-Independent Inference

Compressing Deep Graph Neural Networks via Adversarial Knowledge Distillation

ABKD: Graph Neural Network Compression with Attention-Based Knowledge Distillation

Triplet Knowledge Distillation Networks for Model Compression.

Hierarchical Knowledge Squeezed Adversarial Network Compression

Multi-Scale Distillation from Multiple Graph Neural Networks

Few Sample Knowledge Distillation for Efficient Network Compression

A Model Compression Method Using Significant Data and Knowledge Distillation

Distilling Knowledge from Graph Convolutional Networks

Progressive Network Grafting for Few-Shot Knowledge Distillation

Collaborative Knowledge Distillation Via Multiknowledge Transfer.

Multiple-Stage Knowledge Distillation

Compressing the Multiobject Tracking Model Via Knowledge Distillation

A Novel Multi-Knowledge Distillation Approach.

CDFKD-MFS: Collaborative Data-free Knowledge Distillation Via Multi-level Feature Sharing

CDFKD-MFS: Collaborative Data-free Knowledge Distillation via Multi-level Feature Sharing

Decoupled graph knowledge distillation: A general logits-based method for learning MLPs on graphs

Enhanced Scalable Graph Neural Network via Knowledge Distillation

Data Efficient Stagewise Knowledge Distillation

Deep Collective Knowledge Distillation