A new segmentation-based GPU-accelerated sparse matrix-vector multiplication

kai he,sheldon x d tan,e tlelocuautle,hai wang,he tang
DOI: https://doi.org/10.1109/MWSCAS.2014.6908589
2014-01-01
Circuits and Systems
Abstract:In this paper, we propose a new fast parallel sparse matrix-vector multiplication (SpMV) algorithm on GPU platforms. The new algorithm, called segSpMV, is based on the compressed sparse row (CSR) format and can be applied to wide computational applications with both structured and unstructured matrices. The SpMV operation has very low computing to communication ratio and is bandwidth-limited. The new SpMV algorithm tries to reduce the memory access by partitioning the rows, whose nonzero patterns are irregular in general, into a number of fixed-length segments. As a result, both multiplication and summation phases now can enjoy the coalesced memory access and they can be finished in one kernel launch. The summation phase can also be further improved by using GPU reduction techniques for large segment lengths. The resulting SpMV method constantly outperforms all published algorithms and the SpMV method in the recent CUSPARSE library based on a set of public matrix benchmarks.
What problem does this paper attempt to address?