GE-SpMM: General-Purpose Sparse Matrix-Matrix Multiplication on GPUs for Graph Neural Networks

Guyue Huang,Guohao Dai,Yu Wang,Huazhong Yang
DOI: https://doi.org/10.1109/sc41405.2020.00076
2020-11-01
Abstract:The acceleration of Graph Neural Networks (GNNs) requires efficient and framework-compatible Sparse-Dense Matrix-Matrix Multiplication (SpMM). From the compatibility perspective, the sophisticated sparse matrix representations in state-of-the-art SpMM designs cause heavy preprocessing overhead for the framework. From the efficiency perspective, optimizations for SpMV (Sparse Matrix-Vector) do not apply well to SpMM, leading to redundant and uncoalesced global memory access. We propose GE-SpMM 1, which takes the CSR format consistent with GNN frameworks to enable integration without the format transformation overhead. We use Coalesced Row Caching to ensure coalesced access to both sparse and dense data in the global memory. We use Coarse-grained Warp Merging to reduce redundant data loading among GPU warps. Experiments on a real-world graph dataset demonstrate up to $1.41\times$ speedup over Nvidia cuSPARSE [1] and up to $1.81\times$ over GraphBLAST [2]. We embed GE-SpMM in GNN frameworks and get up to $3.67\times$ speedup on popular GNN models like GCN [3] and GraphSAGE [4]. 1The project is open-sourced at https://github.com/hgyhungry/ge-spmm The project is open-sourced at https://github.com/hgyhungry/ge-spmm
What problem does this paper attempt to address?