Toward Efficient Structured-Grid Triangular Solver on Sunway Many-Core Processors

Jianjiang Li,Jiabi Liang,Wei Xue,Zhengding Hu,Lin Li,Jinliang Shi
DOI: https://doi.org/10.1007/s11227-023-05802-2
2024-01-01
Abstract:The sparse triangular solver (SpTRSV) is mostly used for scientific and engineering applications. The structured-grid triangular solver of regular dependencies (STRSV) is a special kind of SpTRSV. Some general SpTRSVs that disregards the regularity of the matrix are unsuitable for solving this problem. This paper proposes an efficient parallel algorithm for STRSV on the SW26010 (a kind of China independently designed many-core processors), namely swStructTRSV. The algorithm makes full use of the fine-grained and low latency communication characteristics of the SW26010 to reduce the waiting time for synchronization, maximizes the regularity of access to improve memory access bandwidth, and achieves overlap between memory access and computation simultaneously. Moreover, the idea of the algorithm can be extended to incomplete LU factorization (ILU factorization) because of consistent dependencies. The experimental results on a core group(8 * 8 network composed of 64 cores) of SW26010 show that swStructTRSV can achieve an average speedup of over 30 in the sequential version. swStructTRSV on SW26010 achieves solving speedups of 2.2 and 6.3 over the fast STRSV (fSpTRSV) previously implemented on SW26010 and MKL on Intel Xeon Gold 6132, respectively. swStructTRSV significantly outperforms cuSparse on NVIDIA TITAN RTX in terms of overall execution time.
What problem does this paper attempt to address?