SPSA: Exploring Sparse-Packing Computation on Systolic Arrays from Scratch

Minjin Tang,Mei Wen,Jianchao Yang,Zeyu Xue,Junzhong Shen
DOI: https://doi.org/10.1109/tcad.2024.3434359
IF: 2.9
2024-01-01
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Abstract:Sparse Matrix-Matrix Multiplication (SpMM) and Generalized Sparse Matrix-Matrix Multiplication (SpGEMM) are essential computational kernels in domains such as graph analytics and scientific computation. While systolic arrays have traditionally been employed as specialized architectures for complex computing problems like matrix multiplication, they exhibit inefficiency when dealing with sparse matrices. This inefficiency arises from the unnecessary operations performed by processing elements (PEs) that contain zero-valued entries, which do not contribute to the final result. To address this issue, we propose SPSA, a framework that leverages a sparse-packing algorithm suitable for systolic arrays to accelerate sparse matrix computations. Our approach achieves significant reduction of zero-valued items and improves matrix density by packing the rows or columns of the sparse matrix. Furthermore, we have introduced for the first time a data representation format tailored to systolic arrays, called CSXD, which further enhances storage and computational efficiency. Importantly, our adaptation scheme enables acceleration benefits even with limited resources. Through sparse packing, SPSA achieved a 5.2x performance improvement compared to the dense baseline, and further reached a 6.4x enhancement via CSXD. Simultaneously, CSXD realized an average storage efficiency improvement of 15.0x. Through extensive evaluations, SPSA outperforms previous designs on CPU, GPU, and ASIC platforms. Finally, in end-to-end evaluations, SPSA achieved a performance improvement of 3.9 times across the workloads of BERT, VGG19, and ResNet50.
What problem does this paper attempt to address?