Abstract:The moving particle semi-implicit (MPS) method performs well in simulating incompressible flows with free surfaces. Despite its applicability, the MPS method suffers from the fundamental instability problem and high computational cost in its practical application. Substantial research has been conducted on improving the stability and accuracy of the MPS method. Moreover, graphics processing units (GPUs), which are multi-processors that execute many three-dimensional geometric processes at high speed, provide unprecedented capability for scientific computations. However, the usage of a single GPU card is not sufficient for engineering applications that require several million particles that predict the desired physical processes, because the available memory space is insufficient. In this work, the dynamic stability (DS) algorithm and particle shifting (PS) algorithm have been used to overcome the instability and inaccuracies caused by tensile instability and non-uniform particle distribution, respectively. Based on the stable MPS method, a GPU-based MPS code that uses the compute unified device architecture (CUDA) language has been developed. An efficient neighborhood particle search is performed using an indirect method, and the matrix for the pressure Poisson equation (PPE) is assembled in parallel. Based on the single-GPU version, a multi-GPU MPS code has been developed. The approach uses a non-geometric dynamic domain decomposition method that provides homogeneous load balancing whereby different portions (subdomains) of the physical system under study are assigned to different GPUs. Communication between devices is achieved with the use of a message passing interface (MPI). Based on the neighborhood particle search, the techniques for building and updating the “halo” are described in detail. The speed-up of the single-GPU version is analyzed for different numbers of particles, and the scalability of the multi-GPU version is analyzed for different numbers of particles and different numbers of GPUs. Last, an application with more than 107 particles is presented to show the capability of the code in handling large-scale simulations.

Predicting Accurate Hot Spots in a More Than Ten-Thousand-Core GPU with a Million-Time Speedup over FEM Enabled by a Physics-based Learning Algorithm

PODTherm-GP: A Physics-based Data-Driven Approach for Effective Architecture-Level Thermal Simulation of Multi-Core CPUs

PyPOD-GP: Using PyTorch for Accelerated Chip-Level Thermal Simulation of the GPU

Heterogeneous Programming and Optimization of Gyrokinetic Toroidal Code and Large-Scale Performance Test on TH-1A.

Parallelized Implementation of the Finite Particle Method for Explicit Dynamics in GPU

Implementation of the moving particle semi-implicit method for free-surface flows on GPU clusters.

FROM CPU TO GPU: GPU-BASED ELECTROMAGNETIC COMPUTING (GPUECO)

A GPU-accelerated linear system solution for the Galerkin finite element method applied to neutron diffusion equation

Rapid simulation of elastic problems based on GPU

Accelerating Phase-Change Heat Conduction Simulations on GPUs

Heterogeneous parallel computing method for 3D transient nonlinear thermomechanical problems on CPU-GPU platforms

A fast cosine transformation accelerated method for predicting effective thermal conductivity

Generalized GPU Acceleration for Applications Employing Finite-Volume Methods.

Accelerating Unstructured Large Eddy Simulation Solver with GPU

A GPU accelerated mixed-precision Smoothed Particle Hydrodynamics framework with cell-based relative coordinates

GPU power prediction via ensemble machine learning for DVFS space exploration

Forecasting GPU Performance for Deep Learning Training and Inference

GPU coprocessors as a service for deep learning inference in high energy physics

Towards Universal Performance Modeling for Machine Learning Training on Multi-GPU Platforms