Accelerating Pqmrcgstab Algorithm On Gpu

Canqun Yang,Zhen Ge,Juan Chen,Feng Wang,Qiang Wu
DOI: https://doi.org/10.1145/1531666.1531670
2009-01-01
Abstract:The general computations on GPU are becoming more and more popular because of GPU's powerful computing ability. In this paper, how to use GPU to accelerate sparse linear system solver, preconditioned QMRCGSTAB (PQMRCGSTAB for short), is our concern. We implemented a GPU-accelerated PQMRCGSTAB algorithm on NVIDIA Tesla C870. Three optimization methods are used to improve the performance of the GPU-accelerated PQMRCGSTAB algorithm: reorganizing data by data packing and matrix mending to obtain higher memory bandwidths using texture memory instead of shared memory exploiting the kernel mergence. The experimental results show that the GPU-accelerated PQMRCGSTAB algorithm achieves the peak performance of 17.7 GFLOPS on C870. Compared with a MPI version PQMRCGSTAB algorithm executed on in Intel Xeon quad-core CPU, GPU-accelerated PQMRCGSTAB can reach the speedup of five for a 640K x 640K matrix.
What problem does this paper attempt to address?