EC-SpMM: Efficient Compilation of SpMM Kernel on GPUs.

Junqing Lin,Honghe Zhang,Xiaolong Shi,Jingwei Sun,Xianzhi Yu,Jun Yao,Guangzhong Sun
DOI: https://doi.org/10.1145/3605573.3605632
2023-01-01
Abstract:As deep neural networks (DNNs) become increasingly large and complicated, pruning techniques are proposed for lower memory footprint and more efficient inference. The most critical kernel to execute pruned sparse DNNs on GPUs is Sparse-dense Matrix Multiplication (SpMM). To maximize the performance of SpMM, despite the high-performance code generated from recent tensor compilers, they often take a long time for iteratively searching candidate configurations. Such a long time slows down the cycle of exploring better DNN architectures or pruning algorithms. In this paper, we propose EC-SpMM to efficiently generate high-performance SpMM kernels for sparse DNN inference. Based on the analysis of nonzero elements’ layout, the characterization of GPU architecture, and a rank-based cost model, EC-SpMM can effectively reduce the search space and eliminate possibly low-performance candidates. Experimental results show that EC-SpMM can reduce the compilation time by a factor of 35 ×, while the performance of generated SpMM kernels is comparable or even better, compared with the state-of-the-art sparse tensor compiling solution.
What problem does this paper attempt to address?