Improving Performance for Simulating Complex Fluids on Massively Parallel Computers by Component Loop-Unrolling and Communication Hiding

Xiaowei Guo,Chao Li,Wei Li,Yu Cao,Yi Liu,Ran Zhao,Sen Zhang,Canqun Yang
DOI: https://doi.org/10.1109/hpcc-smartcity-dss50907.2020.00017
2020-01-01
Abstract:Due to the complex geometry and physical models of real-world engineering applications, the parallel performance of the mainstream computational fluid dynamics(CFD) codes is unsatisfactory. For complex fluids, an extra stress tensor governed by constitutive equations including nine components brings much more amount of computations. This paper focused on optimizing the most compute-intensive part of a simulation for complex fluids: the iterative linear solver for solving multicomponent equations. Based on the most widely used opensource CFD code OpenFOAM, we unrolled the component loops and replaced the blocking collective MPI calls to non-blocking communications. After operation rescheduling between the loops, the collective communications could be partly overlapped by the computations. Taking the preconditioned conjugate gradient (PCG) algorithm for instance, we presented the complete loop unrolled algorithm for solving multi-component equations. The numerical experiments showed 8.0%~29.0% simulation time reduction for a demonstrative case with 2 million cells on 64~2048 cores. It is worth noting that the approach proposed in this paper is a high-level scheduling algorithm and could be used in combination with other intra-component optimization algorithms, e.g. the pipelined CG methods.
What problem does this paper attempt to address?