Accelerating GNN Inference by Soft Channel Pruning

Wenbo Zhang,Jingwei Sun,Guangzhong Sun
DOI: https://doi.org/10.1109/paap56126.2022.10010603
2022-01-01
Abstract:Graph Neural Networks (GNNs) are effective models for processing graph-structured data. With the continuous growth of graph data scale and the deepening of graph neural network layers, the heavy cost of GNN inference has greatly limited its application in real-time tasks. This paper focus on accelerating the performance of GNN inference. We first measures the execution time ratio of each stage in the inference process for commonly used GNN models, and analyzes the performance characteristics of different stages. We find out that the critical performance factor of GNN inference is the feature dimension, which is different to CNN and NLP models. Therefore, we propose a soft channel pruning method with a ladder pruning pattern. It reduces the calculation from unimportant graph node features and achieve performance acceleration. Meanwhile, it reserves inference accuracy of GNNs. According to experimental validation on graph datasets of different scales, our method can effectively reduce the inference latency and achieve 2×–7× speedup. Also, compared with existing pruning methods, higher inference accuracy can be obtained with comparable speedup ratio.
What problem does this paper attempt to address?