A W-cycle Algorithm for Efficient Batched SVD on GPUs

Junmin Xiao,Qing Xue,Hui Ma,Xiaoyang Zhang,Guangming Tan
DOI: https://doi.org/10.1145/3503221.3508443
2022-01-01
Abstract:As a fundamental factorization operation, the singular value decomposition (SVD) plays a paramount role in abroad range of domains such as scientific computing and machine learning. Due to its computational bottleneck of factorization for small matrices in real-world applications, many GPU-accelerated batched SVD algorithms have been investigated recently. However, these algorithms failed to achieve a balance between data locality and parallelism because their workflows depend on the size of each matrix. In this work, we propose a matrix-size-independent W-cycle algorithm to accelerate the batched one-side Jacobi SVD on GPUs, which successfully strikes the balance between data locality and parallelism. The experimental evaluation demonstrates that the proposed algorithm achieves 4.5X performance speedup on average over the state-of-the-art cuSOLVER.
What problem does this paper attempt to address?