Optimizing Multi-Grid Preconditioned Conjugate Gradient Method on Multi-Cores

Fan Yuan,Xiaojian Yang,Shengguo Li,Dezun Dong,Chun Huang,Zheng Wang
DOI: https://doi.org/10.1109/tpds.2024.3372473
IF: 5.3
2024-03-19
IEEE Transactions on Parallel and Distributed Systems
Abstract:Multigrid preconditioned conjugate gradient (MGPCG) is commonly used in high-performance computing (HPC) workloads. However, MGPCG is notoriously challenging to optimize since most of its computation kernels are memory-bounded with low arithmetic intensity and non-trivial communication patterns among parallel processes. This article presents new techniques to improve the data locality and reduce the communication overhead of MGPCG by first merging the kernels of multigrid (MG). We then develop an asynchronous neighboring communication algorithm to reduce the data communications across parallel processes. We demonstrated the benefits of our approach by applying it to the high-performance conjugate gradient (HPCG) benchmark and integrating it with a real-life algebraic multigrid package. We test the resulting software implementations on three ARMv8 and one Intel Xeon system. Experimental results show that our approach leads to a 1.62x-2.54x speedup over the engineer- and vendor-tuned HPCG implementations across various workloads and platforms.
computer science, theory & methods,engineering, electrical & electronic
What problem does this paper attempt to address?