A two-stage parallel method on GPU based on hybrid-compression-format for diagonal matrix

Huanyu Cui,Nianbin Wang,Qilong Han,Ye Wang,Jiahang Li
DOI: https://doi.org/10.1002/cpe.7887
2024-01-01
Abstract:SpMV (Sparse matrix-vector multiplication) is an important computing core in traditional high-performance computing and also one of the emerging data-intensive applications. For diagonal sparse matrices, it is frequently necessary to fill in a large number of zeros to maintain the diagonal structure as for using DIA (Diagonal) storage format. The fact that filling with zeros may consume additional computing and memory resources, will certainly lead to degradation of the parallel computing performance of SpMV, further causing computing and storage redundancy. To solve the deficiencies of the DIA format, a Two-stage parallel SpMV method is presented in this paper, which can reasonably distribute the data of diagonal matrix and irregular matrix to different CUDA kernels. As different corresponding compression methods are particularly designed for different matrix forms, a partition-based hybrid format of DIA and CSR (HPDC) is therefore adopted in the two-stage method to ensure load balancing among computing resources and continuity of data access on the diagonal. Simultaneously, a standard deviation among blocks is used as a criterion to obtain the optimal number of blocks and distribution of data. The experimental data were implemented in the Florida data set. Compared to DIA, cuSPARSE-CSR, HDC, and BRCSD, the execution time of the Two-stage method is shortened by 4x$$ \times $$, 3.4x$$ \times $$, 1.9x$$ \times $$, and 1.15x$$ \times $$, respectively.
What problem does this paper attempt to address?