A Simple and Efficient Storage Format for SIMD-accelerated SpMV

Bian Haodong,Tsinghua University,Dong Runting,Guo Yuluo,Liu Lingbin,Huang Dongqiang,Wang Xiaoying
DOI: https://doi.org/10.1007/s10586-021-03340-1
2021-01-01
Cluster Computing
Abstract:SpMV (Sparse matrix-vector multiplication) is an essential component in scientific computing and has attracted the attention of researchers in related fields at home and abroad. With the continuous expansion of matrix data, the efficient parallel SpMV algorithm has become a research hotspot for research experts in related fields. The sparse matrix compression format as a critical point to improve computing performance can effectively save storage space and efficiently cooperate with the advantages of the processor system structure to give full play to performance. This paper proposes a new sparse matrix storage format CSR2 (Compressed Sparse Row 2). It is a new single format and suitable for processor platforms with SIMD (Single Instruction Multiple Data) vectorizations. The format operation of CSR2 is easy to implement with a low overhead of conversion. We compared the SpMV algorithm based on CSR2 with the most advanced single format CSR5 (Compressed Sparse Row 5) and Intel MKL (Intel Math Kernel Library) on the mainstream high-performance processor Intel Xeon E5-2670 v3 CPU. We choose 48 sets of matrices to be used as a benchmark suite. Experimental results show that CSR2 has a remarkable performance improvement compared with CSR5 and MKL. Compared to CSR5, CSR2 can achieve an average acceleration of 1.401 × (up to 1.861 ×). Compared to MKL, CSR2 can achieve an average acceleration of 1.261 × (up to 5.921 ×). In reality, for applications with multiple iterations, using our CSR2 can bring low-overhead format conversion and high-throughput computing performance.
What problem does this paper attempt to address?