Performance Optimization for SpMV on Multi-GPU Systems Using Threads and Multiple Streams

Ping Guo,Changjiang Zhang
DOI: https://doi.org/10.1109/sbac-padw.2016.20
2016-01-01
Abstract:Sparse matrix-vector multiplication (SpMV) is a key operation in scientific computing and engineering ap-plications. This paper presents an optimization strategy to improve SpMV performance on the multi-GPU systems by adopting OpenMP threads and multiple CUDA streams. We propose an efficient scheme to control multiple GPUs jointly complete SpMV computations by making use of OpenMP threads. Moreover, we adopt streamed approach to increase concurrency to further improve SpMV performance. In our paper, we use HYB (Hybrid ELL/COO), a hybrid sparse storage format, to demonstrate the effectiveness of our proposed approach. Our experimental results show that our approach achieves an average speedup of 3.80 over the existing SpMV implementation on a single GPU.
What problem does this paper attempt to address?