SpWMM: A High-Performance Sparse-Winograd Matrix-Matrix Multiplication Accelerator for CNNs.

Di Wu,Xitian Fan,Wei Cao,Lingli Wang
DOI: https://doi.org/10.1109/icfpt47387.2019.00041
2021-01-01
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Abstract:In recent years, many CNN accelerators are proposed to exploit the sparsity of the networks to enjoy the benefits of both computation and memory reduction. However, these accelerators either cannot exploit the sparsity of both activations and weights, or cannot achieve stable performance with a static scheduling strategy, which is vulnerable to the sparsity distribution. This paper proposes a dynamic scheduling strategy and a balanced compressed sparse row (BCSR) format to efficiently address these two issues. A set-associate structure is presented to tradeoff the load balance and logic overhead. We propose SpWMM to accelerate the CNN inference, which is the first work to implement both sparse Winograd convolution and sparse fully-connected (FC) layers. On contemporary neural networks, this work achieves: (1) 2.6Top/s for Winograd convolution and 525Gop/s for 1×1 convolution and FC layers in the 4-way association design on Xilinx ZC706 platform, (2) 6.5 Top/s for Winograd convolution and 1.2Top/s for 1×1 convolution and FC layers in the 16-way association design on Xilinx VCU1525 platform. Compared with the state-of-the-art works on the same platform, the 4-way design achieves 2.0× speedup.
What problem does this paper attempt to address?