A Comprehensive Performance Model of Sparse Matrix-Vector Multiplication to Guide Kernel Optimization
Tian Xia,Gelin Fu,Chenyang Li,Zhongpei Luo,Lucheng Zhang,Ruiyang Chen,Wenzhe Zhao,Nanning Zheng,Pengju Ren
DOI: https://doi.org/10.1109/tpds.2022.3225230
IF: 5.3
2022-12-24
IEEE Transactions on Parallel and Distributed Systems
Abstract:Sparse Matrix-Vector Multiplication (SpMV) is important in scientific and industrial applications and remains a well-known challenge for modern CPUs due to high sparsity and irregularity. Many researchers try to improve SpMV performance by designing dedicated data formats and computation patterns. However, out-of-order superscalar CPUs have complex micro-architectures where exist complicated interactions and restrictions among software and hardware factors. It is hard to systematically study the effectiveness of optimization methods on the overall performance, as its benefits may be undermined by other factors. In this paper, we thoroughly study the execution of SpMV on modern CPUs and propose a comprehensive performance model to reveal the critical factors and their relationships. Specifically, we first study the coding characteristics of SpMV kernels to identify key factors worthy of attention. Then we model the execution of SpMV as two overlapped parts: CPU pipeline and memory latency. Both are carefully modeled with related hardware and software factors. We also model SIMD performance with the usage of specific SIMD instructions and vector registers. Experiments show that our model matches the actual execution of real-world processors. Guided by the model, we propose SpV8, a novel SpMV kernel that optimizes critical factors to improve computation efficiency and memory bandwidth. Experiments on Intel/AMD x86 and ARM AArch64 platforms show that SpV8 outperforms several state-of-the-art approaches with large margins, achieving average 3.4× over Intel Math Kernel Library and .4× over the best existing approach. Such results indicate that the proposed model is capable of valuable guidance for efficient SpMV optimizations.
computer science, theory & methods,engineering, electrical & electronic