Performance Modeling and Optimization of Parallel LU-SGS on Many-Core Processors for 3D High-Order CFD Simulations

Dali Li,Chuanfu Xu,Bin Cheng,Min Xiong,Xiang Gao,Xiaogang Deng
DOI: https://doi.org/10.1007/s11227-016-1943-0
IF: 3.3
2016-01-01
The Journal of Supercomputing
Abstract:As a typical Gauss–Seidel method, the inherent strong data dependency of lower-upper symmetric Gauss–Seidel (LU-SGS) poses tough challenges for shared-memory parallelization. On early multi-core processors, the pipelined parallel LU-SGS approach achieves promising scalability. However, on emerging many-core processors such as Xeon Phi, experience from our in-house high-order CFD program show that the parallel efficiency drops dramatically to less than 25%. In this paper, we model and analyze the performance of the pipelined parallel LU-SGS algorithm, present a two-level pipeline (TL-Pipeline) approach using nested OpenMP to further exploit fine-grained parallelisms and mitigate the parallel performance bottlenecks. Our TL-Pipeline approach achieves 20% performance gains for a regular problem \((256\times 256\times 256)\) on Xeon Phi. We also discuss some practical problems including domain decomposition and algorithm parameters tuning for realistic CFD simulations. Generally, our work is applicable to the shared-memory parallelization of all Gauss–Seidel like methods with intrinsic strong data dependency.
What problem does this paper attempt to address?