Abstract:The moving particle semi-implicit (MPS) method performs well in simulating incompressible flows with free surfaces. Despite its applicability, the MPS method suffers from the fundamental instability problem and high computational cost in its practical application. Substantial research has been conducted on improving the stability and accuracy of the MPS method. Moreover, graphics processing units (GPUs), which are multi-processors that execute many three-dimensional geometric processes at high speed, provide unprecedented capability for scientific computations. However, the usage of a single GPU card is not sufficient for engineering applications that require several million particles that predict the desired physical processes, because the available memory space is insufficient. In this work, the dynamic stability (DS) algorithm and particle shifting (PS) algorithm have been used to overcome the instability and inaccuracies caused by tensile instability and non-uniform particle distribution, respectively. Based on the stable MPS method, a GPU-based MPS code that uses the compute unified device architecture (CUDA) language has been developed. An efficient neighborhood particle search is performed using an indirect method, and the matrix for the pressure Poisson equation (PPE) is assembled in parallel. Based on the single-GPU version, a multi-GPU MPS code has been developed. The approach uses a non-geometric dynamic domain decomposition method that provides homogeneous load balancing whereby different portions (subdomains) of the physical system under study are assigned to different GPUs. Communication between devices is achieved with the use of a message passing interface (MPI). Based on the neighborhood particle search, the techniques for building and updating the “halo” are described in detail. The speed-up of the single-GPU version is analyzed for different numbers of particles, and the scalability of the multi-GPU version is analyzed for different numbers of particles and different numbers of GPUs. Last, an application with more than 107 particles is presented to show the capability of the code in handling large-scale simulations.

Porting the Princeton Ocean Model to GPUs.

A customized GPU acceleration of the princeton ocean model

Gpupom: a GPU-based Princeton Ocean Model

POM.gpu-v1.0: a GPU-based Princeton Ocean Model

Implementation of the moving particle semi-implicit method for free-surface flows on GPU clusters.

The Implementation of the Three-Dimensional Unified Gas-Kinetic Wave-Particle Method on Multiple Graphics Processing Units

Parallelized Implementation of the Finite Particle Method for Explicit Dynamics in GPU

GPU-HADVPPM4HIP V1.0: using the heterogeneous-compute interface for portability (HIP) to speed up the piecewise parabolic method in the CAMx (v6.10) air quality model on China's domestic GPU-like accelerator

GPU-HADVPPM V1.0: a high-efficiency parallel GPU design of the piecewise parabolic method (PPM) for horizontal advection in an air quality model (CAMx V6.10)

An Evaluation and Comparison of GPU Hardware and Solver Libraries for Accelerating the OPM Flow Reservoir Simulator

Unleashing the Performance Potential of CPU-GPU Platforms for the 3D Atmospheric Euler Solver.

A GPU Accelerated Finite Volume Coastal Ocean Model

Generalized Gpu Acceleration For Applications Employing Finite-Volume Methods

OpenMP offload toward the exascale using Intel® GPU Max 1550: evaluation of STREAmS compressible solver

Optimization and acceleration of flow simulations for CFD on CPU/GPU architecture

An MPI+OpenACC-based PRM Scalar Advection Scheme in the GRAPES Model over a Cluster with Multiple CPUs and GPUs

Method for portable, scalable, and performant GPU-accelerated simulation of multiphase compressible flow

A Peta-Scalable CPU-GPU Algorithm for Global Atmospheric Simulations

Accelerating the 3D Euler Atmospheric Solver Through Heterogeneous CPU-GPU Platforms

Massive parallelization and performance enhancement of an immersed boundary method based unsteady flow solver

Solving global shallow water equations on heterogeneous supercomputers