Sparkle: A High Efficient Sparse Matrix Multiplication Accelerator for Deep Learning

Shiyao Xu,Jingfei Jiang,Jinwei Xu,Chaorun Liu,Yuanhong He,Xiaohang Liu,Lei Gao
DOI: https://doi.org/10.1109/iccd56317.2022.00077
2022-01-01
Abstract:Deep learning (DL) technology is applied to a wide range of intelligent tasks across vision, language, recommendation systems, etc. Large DL models with high sparsity become critical for various intelligent applications and require an energy-efficient hardware accelerator. Sparse-dense matrix multiplication (SpMM) is a key computation kernel widely used in most sparse and large DL workloads. However, traditional computing platforms such as CPU, GPU, and Al chips with regular processing units are limited to support sparsity by their fixed structures. In this work, a specific SpMM accelerator named Sparkle is proposed which achieves high performance and high computational efficiency. A block-wise arrangement approach is proposed in Sparkle to process matrix multiplications. A novel compressed sparsity format, the pointer-bitmap, is designed to simplify the decoding process and improve the efficiency of data loading. Grouped PEs and configurable hierarchical reduction network are deployed to leverage sparsity, further enhancing the utilization of the compute resources. Sparkle is implemented using the Xilinx xqvu11p FPGA. A diverse set of matrices in DL workloads are evaluated and Sparkle achieves 2.1× higher energy efficiency over the NVIDIA TITAN X GPU. Our experiments also show that Sparkle roughly promotes 26% compute efficiency better than state-of-the-art sparse accelerators SIGMA.
What problem does this paper attempt to address?