Dedicated Hardware Accelerators for Processing of Sparse Matrices and Vectors: A Survey

Valentin Isaac–Chassande,Adrian Evans,Yves Durand,Frédéric Rousseau
DOI: https://doi.org/10.1145/3640542
IF: 1.444
2024-02-15
ACM Transactions on Architecture and Code Optimization
Abstract:Performance in scientific and engineering applications such as computational physics, algebraic graph problems or Convolutional Neural Networks (CNN), is dominated by the manipulation of large sparse matrices—matrices with a large number of zero elements. Specialized software using data formats for sparse matrices has been optimized for the main kernels of interest: SpMV and SpMSpM matrix multiplications, but due to the indirect memory accesses, the performance is still limited by the memory hierarchy of conventional computers. Recent work shows that specific hardware accelerators can reduce memory traffic and improve the execution time of sparse matrix multiplication, compared to the best software implementations. The performance of these sparse hardware accelerators depends on the choice of the sparse format, COO , CSR , etc, the algorithm, inner-product , outer-product , Gustavson , and many hardware design choices. In this article, we propose a systematic survey which identifies the design choices of state-of-the-art accelerators for sparse matrix multiplication kernels. We introduce the necessary concepts and then present, compare, and classify the main sparse accelerators in the literature, using consistent notations. Finally, we propose a taxonomy for these accelerators to help future designers make the best choices depending on their objectives.
computer science, theory & methods, hardware & architecture
What problem does this paper attempt to address?