Optimizing SpMV on Heterogeneous Multi-Core DSPs Through Improved Locality and Vectorization

Deshun Bi,Shengguo Li,Dezun Dong,Peng Zhang,Jianbin Fang
DOI: https://doi.org/10.1145/3673038.3673061
2024-01-01
Abstract:The sparse matrix-vector multiplication (SpMV) is widely used in large-scale scientific computing and engineering. However, optimizing SpMV for high-performance digital signal processors (DSPs) has received limited attention. We present HaLAV, a method to accelerate SpMV on CPU-DSP heterogeneous platforms, using the FT-M7032 DSP platform as a case study. HaLAV partitions the input matrix into ‘dense’ and ‘sparse’ parts through column reordering. For the dense part, HaLAV automatically selects storage formats optimized for vectorization to run on the DSP. At the same time, it offloads the sparse component to be processed by the CPU using the standard CSR algorithm. We evaluate our approach on the FT-M7032 platform and an Intel Xeon CPU. Experimental results show that our techniques achieve average speedups of 2.09 × and 1.66 × over the competing baselines on the FT-M7032 and the Xeon platform, respectively.
What problem does this paper attempt to address?