Accelerating Large-Scale CFD Simulations with Lattice Boltzmann Method on a 40-Million-core Sunway Supercomputer.
Zhao Liu,Xuesen Chu,Xiaojing Lv,Hanyue Liu,Haohuan Fu,Guangwen Yang
DOI: https://doi.org/10.1145/3605573.3605605
2023-01-01
Abstract:The Lattice Boltzmann Method (LBM) has gained widespread popularity due to its applicability in fluid dynamics, chemical engineering, material science, and other domains. In this work, we present an optimized implementation of the LBM, with a specific focus on achieving superior performance and scalability on advanced heterogeneous systems such as the new Sunway supercomputer. To accomplish this, we employ several techniques, including kernel fusion to enhance temporal and spatial locality, a customized multi-level domain decomposition and data sharing scheme, and pipelining strategies that are tailored to the SW26010-Pro processor. As a result of these optimizations, we have successfully scaled our code to a total of 39,000,000 CPU cores. Our largest simulation, which encompassed over 42 trillion lattice cells, achieved an impressive 67,018 billion lattice cell updates per second (GLUPS), with 82.9% memory bandwidth utilization, and a sustained performance of 28 PFlops. In order to assess the portability of our implementation, we also adapted our code to run on a GPU cluster, utilizing a range of tailored optimization techniques. Our results demonstrated a 191x speedup, along with 83.8% memory bandwidth utilization. Our proposed approach marks a significant milestone in the field of LBM implementations, as it demonstrates unprecedented scalability by effectively utilizing over 39,000,000 cores while maintaining exceptional parallel efficiency and computational performance. This achievement establishes our method as a compelling solution for addressing large-scale computational fluid dynamics challenges on heterogeneous systems.