Support Convolution of CNN with Compression Sparse Matrix Multiplication Flow in TVM

Hui-Hsin Liao,Chao-Lin Lee,Jenq-Kuen Lee,Wei-Chih Lai,Ming-Yu Hung,Chung-Wen Huang
DOI: https://doi.org/10.1145/3458744.3473352
2021-08-09
Abstract:Recently, machine learning has been widely adopted in various scenarios, especially in edge devices. These edge devices, such as smartphones or IoT devices, are usually powered by limited batteries. Therefore, how to increase performance and achieve power savings become one of the critical issues during the development of deep learning frameworks. In the research efforts, there are numerous optimizations or methodologies developed to aim at improving CNN performance. In this paper, we focus on the convolution layer in CNN, which is one of the most computationally demanding operators in neural networks. Therefore, improving the convolution will contribute significantly to the entire model. We find the opportunities of sparse convolution, in which the certain matrices are with high sparsity. We proposed a flow in TVM, which provides a sparse convolution flow with weight pruning. In our flow, we maximize the sparsity by pruning certain weight and pertaining the model. With our proposed flow, TVM model runtime could achieve 11.42x speedup on average with ImageNet based models compared to the original flow.
What problem does this paper attempt to address?