Efficiently Running SpMV on Multi-core DSPs for Banded Matrix

Deshun Bi,Shengguo Li,Yichen Zhang,Xiaojian Yang,Dezun Dong
DOI: https://doi.org/10.1007/978-981-97-0808-6_12
2024-01-01
Abstract:Sparse matrix-vector multiplication (SpMV) plays a pivotal role in large-scale scientific computing. Despite the increasing use of low-power multicore digital signal processors (DSPs) in high performance computing (HPC) systems, optimizing SpMV on these platforms has been largely overlooked. This paper introduces the FT-M7032, a new CPU-DSP heterogeneous processor multi-core platform for high-performance computing. The FT-M7032 provides programmable memory units at multiple levels, but effectively utilizing these units poses a challenge. To address this, we evaluate the transfer capability between different units to map matrix elements to storage units. Based on our evaluation, we propose an efficient parallel implementation, SpMV_Band, specifically designed for banded matrices. Furthermore, we devise a computation pipeline that optimizes memory access overhead by overlapping data transfers and computations. To evaluate our approach, we compare its performance with a baseline executed on the general-purpose CPU cores of the FT-M7032 heterogeneous platform. Experimental results demonstrate that our techniques achieve a significant speedup of 2.0x compared to the competing baselines.
What problem does this paper attempt to address?