Narrow the Input Mismatch in Deep Graph Neural Network Distillation

Qiqi Zhou,Yanyan Shen,Lei Chen
DOI: https://doi.org/10.1145/3580305.3599442
2023-01-01
Abstract:Graph neural networks (GNNs) have been widely studied for modeling graph-structured data. Thanks to the over-parameterization and large receptive field of deep GNNs, "deep" is a promising direction to develop GNNs further and has shown some superior performances. However, the over-stacked structures of deep architectures incur high inference cost in deployment. To compress deep GNNs, we can use knowledge distillation (KD) to make shallow student GNNs mimic teacher GNNs. Existing KD methods in graph domain focus on constructing diverse supervision on embedding or prediction produced by student GNNs, but overlook the gap of the receptive field (i.e., input information) between student and teacher, which brings difficulties to KD. We call this gap "input mismatch". To alleviate this problem, we propose a lightweight stochastic extended module to provide an estimation for missing input information for student GNNs. The estimator models the distribution of missing information. Specifically, we model the missing information as an independent distribution from graph level and a conditional distribution from node level (given the condition of observable input). These two estimates are optimized using a Bayesian methodology and combined into a balanced estimate as additional input to student GNNs. To the best of our knowledge, we are the first to address the "input mismatch" problem in deep GNNs distillation. Experiments on extensive benchmarks demonstrate that our method outperforms existing KD methods for GNNs in distillation performance, which confirms that the estimations are reasonable and effective.
What problem does this paper attempt to address?