Research on SpMV Implementation and Vector X Hit Rate Optimization for SW26010p Many-Core Platform
Mengfei Ma,Zhiqiang Wei,Xiaoli Jing,Dongning Jia,Jiali Xu,Yucheng Wang,Chengfeng Zhang,Hengmin Han
DOI: https://doi.org/10.21203/rs.3.rs-1940073/v1
2022-01-01
Abstract:Abstract In recent years, With the development of computer hardware, the supercomputer industry has ushered in a stage of rapid development, and its architecture has also evolved from traditional multi-core to many-core and heterogeneous many-core. Among them, sunway Many-Core Platform series with completely independent intellectual property rights is the representative of China’s supercomputing heterogeneous many-core processors. As a computing kernel, SpMV (sparse matrix-vector multiplication) is of great significance in scientific and engineering computing whose computing performance often has a great impact on the overall performance of applications. The article analyzes the master-slave acceleration architecture of the SW26010p Many-Core Platform processor and the implementation of the sparse matrix in the CSR storage format on the SW26010p Many-Core Platform. Due to the limited memory of the slave core of the SW26010p, the vector data stored in large-scale SpMV cannot be satisfied, resulting in a long memory access time and reduced performance. To solve this problem and optimize the calculation performance of SpMV, this paper has launched a research on the optimization strategy of SpMV for SW26010p Many-Core Platform. Firstly, we propose a method of assigning tasks by the number of rows in which the non-zero elements are located to solve the load balancing problem among slave cores. Secondly, we propose an adaptive memory allocation algorithm for LDM to achieve the optimal use of LDM memory. Thirdly, according to the refined division of the LDM space, various algorithms such as the dynamic and static double cache algorithm based on the secondary core architecture LRU and LUR-k, and the dynamic and static cache elimination algorithm based on the secondary core architecture ARC are proposed to improve the hit rate of vector x respectively. the performance of SpMV is optimized by reducing communication time and improving calculation and memory access ratio. Finally, several representative sparse matrices are selected from matrix set (Market) and tested, and the performance of several algorithms is analyzed. The results show that compared with the traditional method, the overall x hit ratio of our scheme is greatly improved, and the master-slave acceleration ratio is also greatly improved, the maximum acceleration ratio can reach more than 20 times and the average speed-up ratio can reach 10.5 times, which has a very good optimization effect. Meanwhile, the optimization methods adopted in this paper can be used for reference for other complex applications of SW26010p.