Atomic Reduction Based Sparse Matrix-Transpose Vector Multiplication on GPUs

Yuan Tao,Yangdong Deng,Shuai Mu,Mingfa Zhu,Limin Xiao,Li Ruan,Zhibin Huang
DOI: https://doi.org/10.1109/padsw.2014.7097920
2014-01-01
Concurrency and Computation Practice and Experience
Abstract:Sparse Matrix-Transpose Vector Product (SMTVP) is a frequently used computation pattern in High Performance Computing applications. It is typically solved by transposition followed by a Sparse Matrix-Vector Product (SMVP) in current linear algebra packages. However, the transposition process can be a serious bottleneck on modern parallel computing platforms. A previous work proposed a relatively complex data structure for efficiently computing SMTVP with multi-core CPUs, but it proved to be inefficient on GPUs. In this work, we show that the Compressed Sparse Row (CSR) based SMVP algorithm can also be efficient for SMTVP computation on modern GPUs. The proposed method exploits atomic operations to perform the reduce operation in the computation of each inner product of a row in the transposed matrix and the vector. Experimental results show that the simple technique can outperform the SMTVP flow of transposition plus SMVP released in the CUSPARSE package by up to 405-fold.
What problem does this paper attempt to address?