Abstract:This research further extends an existing state-of-the-art GPU framework of SPH simulation for better performance without any compromise of numerical accuracy and sacrifice of simulation detail. Towards this grand goal, we devise three new strategies. First, we design a novel hierarchical grid to decompose the simulation space, where we can locate particles in a more refined unit space. With this organization, particle distribution coherence in global memory is improved. As a result, global memory operations can be more efficient. Second, based on our well designed hierarchical grid, we propose a hierarchical neighbor search strategy for catering to the heterogeneous distribution of particles in cells, so we can search a neighbor particle in different-level grids. In a cell with dense particles, we can search particles in a higher-level grid whose unit space is more refined, which means we can narrow the search space for decreasing the access of false neighbor particles. In contrast, in a cell with sparse particles, we search particles in a lower-level grid whose unit space is relatively large, which means we can decrease the loop iterations when we search the entire neighbor cell. In this way, we can avoid the unnecessary overhead of loop iterations. After designing a reasonable neighbor search strategy, an efficient thread cooperation strategy can further improve the performance of our framework by realizing more potentials of GPGPU. The existing state-of-the-art method is concentrating on task assignment strategy for taking the full advantage of shared memory. The method decomposes particles in a cell into several tasks and then assigns a cooperation thread array (CTA) to each task. However, this method has not fully considered the uniformity of tasks belonging to the same cell, as these tasks always share a lot of particles with the same neighborhood. Finally, we propose a hierarchical task assignment strategy by merging such successive tasks into a larger task, which means we only load the same neighbor particles once and the corresponding CTAs are arranged to work together to handle the new task. Our method can greatly reduce the overload of neighbor particles and the overhead of neighbor search iterations. Through our comprehensive tests validated in practice, our work can exhibit 1.73× speedup when compared with other state-of-the-art GPGPU frameworks for SPH simulation.

An Optimized GPU Implementation of Weakly-Compressible SPH Using CUDA-Based Strategies

An Improved GPU Acceleration Framework for Smoothed Particle Hydrodynamics.

Accelerate Smoothed Particle Hydrodynamics Using GPU

Implementation of the moving particle semi-implicit method for free-surface flows on GPU clusters.

A GPU-based SPH Algorithm

A GPU-Based Method for Weakly Compressible Fluids

A GPU accelerated mixed-precision Smoothed Particle Hydrodynamics framework with cell-based relative coordinates

A General Novel Parallel Framework for SPH-centric Algorithms

GPU-accelerated adaptive particle splitting and merging in SPH

GPGPU-based Smoothed Particle Hydrodynamic Fluid Simulation

Real-Time Incompressible Fluid Simulation on the GPU.

Novel Hierarchical Strategies for SPH-centric Algorithms on GPGPU

A Parallel Implementation of A Smoothed Particle Hydrodynamics Method on Graphics Hardware Using the Compute Unified Device Architecture

CPU/GPU Heterogeneous Parallel CFD Solver and Optimizations

An Integrated Algorithm of Real-Time Fluid Simulation on GPU

Hydrodynamic Simulations using GPGPU Architectures

A FULL GPU IMPLEMENTATION FRAMEWORK OF SPH FLUID REAL-TIME SIMULATION

A Sph-Based Fluid Simulation Framework On Gpu

Implementation of the Moving Particle Semi-Implicit Method on GPU

Generalized Gpu Acceleration For Applications Employing Finite-Volume Methods