Abstract:The never-ending computational demand from simulations of turbulence makes computational fluid dynamics (CFD) a prime application use case for current and future exascale systems. High-order finite element methods, such as the spectral element method, have been gaining traction as they offer high performance on both multicore CPUs and modern GPU-based accelerators. In this work, we assess how high-fidelity CFD using the spectral element method can exploit the modular supercomputing architecture at scale through domain partitioning, where the computational domain is split between a Booster module powered by GPUs and a Cluster module with conventional CPU nodes. We investigate several different flow cases and computer systems based on the modular supercomputing architecture (MSA). We observe that for our simulations, the communication overhead and load balancing issues incurred by incorporating different computing architectures are seldom worthwhile, especially when I/O is also considered, but when the simulation at hand requires more than the combined global memory on the GPUs, utilizing additional CPUs to increase the available memory can be fruitful. We support our results with a simple performance model to assess when running across modules might be beneficial. As MSA is becoming more widespread and efforts to increase system utilization are growing more important our results give insight into when and how a monolithic application can utilize and spread out to more than one module and obtain a faster time to solution.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is: How to efficiently conduct large - scale high - fidelity Computational Fluid Dynamics (CFD) simulations on the Modular Supercomputing Architecture (MSA), especially when the simulation scale exceeds the processing capacity of a single module, how to rationally allocate computing tasks to improve performance and reduce solution time. ### Specific problems include: 1. **Utilization of Modular Supercomputing Architecture**: - How to allocate computing tasks to different computing modules (such as GPU modules and CPU modules) through domain decomposition in MSA, thereby achieving efficient parallel computing. - Explore the optimal operation mode of different - sized workloads on the heterogeneous MSA system. 2. **Communication Overhead and Load Balancing**: - Analyze the impact of communication overhead and load - balancing problems on performance when different computing architectures (such as GPU and CPU) are used in combination. - Evaluate when and how computing tasks can be allocated among multiple modules to reduce solution time. 3. **Application of Performance Models**: - Use a simple performance model to evaluate the potential performance gain when running on multiple computing modules. - Determine under which circumstances using multiple modules can significantly improve performance, especially when the problem scale is large and cannot be fully accommodated in one module. 4. **Challenges in Practical Applications**: - Explore whether it is attractive to use different computing modules in combination in the actual production environment, and provide specific examples of performance improvement. ### Core contributions of the paper: - **Empirical Comparison**: Evaluate the performance of different flow configurations under different GPU/CPU configurations, including the impact of I/O on load balancing. - **Performance Model**: Develop a simple performance model for analyzing and evaluating the performance potential when running on multiple architectures. - **Performance Improvement**: When the simulation cannot fully adapt to the GPU module, a performance improvement of up to 2.7 times is observed by using the GPU and CPU modules simultaneously. ### Conclusion: Through empirical research and performance models, the paper explores the best practices for conducting large - scale high - fidelity CFD simulations on the modular supercomputing architecture. The research results provide an important reference for optimizing CFD simulations on heterogeneous computing platforms in the future, especially in terms of how to rationally allocate computing resources to improve performance and reduce solution time.

Experience and Analysis of Scalable High-Fidelity Computational Fluid Dynamics on Modular Supercomputing Architectures

Performance optimizations for scalable CFD applications on hybrid CPU+MIC heterogeneous computing system with millions of cores

Scaling Computational Fluid Dynamics: In Situ Visualization of NekRS using SENSEI

A modular massively parallel computing environment for three-dimensional multiresolution simulations of compressible flows

Towards a Scalable Hierarchical High-order CFD Solver

Efficiency and scalability of fully-resolved fluid-particle simulations on heterogeneous CPU-GPU architectures

Scalable Flow Simulations with the Lattice Boltzmann Method

Method for scalable and performant GPU-accelerated simulation of multiphase compressible flow

Exascale Computational Fluid Dynamics in Heterogeneous Systems

Solving global shallow water equations on heterogeneous supercomputers

Acceleration for CFD Applications on Large GPU Clusters: an NPB Case Study

Accelerating CFD simulation with high order finite difference method on curvilinear coordinates for modern GPU clusters

Speed, power and cost implications for GPU acceleration of Computational Fluid Dynamics on HPC systems

Collaborating CPU and GPU for Large-Scale High-Order CFD Simulations with Complex Grids on the TianHe-1A Supercomputer

Heterogeneous Computing and Optimization on Tianhe-2 Supercomputer System for High-Order Accurate CFD Applications

Method for portable, scalable, and performant GPU-accelerated simulation of multiphase compressible flow

High-Performance Spectral Element Methods on Field-Programmable Gate Arrays

Towards a Peta-scale Unstructured Computational Fluid Dynamics (CFD) Acceleration Toolkit Based on Sunway Architectures?

Towards Exascale for Wind Energy Simulations

Accelerating Large-Scale CFD Simulations with Lattice Boltzmann Method on a 40-Million-core Sunway Supercomputer.

Neko: A modern, portable, and scalable framework for high-fidelity computational fluid dynamics