Exploring the Design Space of Distributed Parallel Sparse Matrix-Multiple Vector Multiplication

Hua Huang,Edmond Chow
DOI: https://doi.org/10.1109/tpds.2024.3452478
IF: 5.3
2024-09-20
IEEE Transactions on Parallel and Distributed Systems
Abstract:We consider the distributed memory parallel multiplication of a sparse matrix by a dense matrix (SpMM). The dense matrix is often a collection of dense vectors. Standard implementations will multiply the sparse matrix by multiple dense vectors at the same time, to exploit the computational efficiencies therein. But such approaches generally utilize the same sparse matrix partitioning as if multiplying by a single vector. This article explores the design space of parallelizing SpMM and shows that a coarser grain partitioning of the matrix combined with a column-wise partitioning of the block of vectors can often require less communication volume and achieve higher SpMM performance. An algorithm is presented that chooses a process grid geometry for a given number of processes to optimize the performance of parallel SpMM. The algorithm can augment existing graph partitioners by utilizing the additional concurrency available when multiplying by multiple dense vectors to further reduce communication.
computer science, theory & methods,engineering, electrical & electronic
What problem does this paper attempt to address?