Abstract:Graphics processing units (GPUs) are becoming a compelling acceleration strategy for geoscience numerical models due to their powerful computing performance. In this study, AMD's heterogeneous-compute interface for portability (HIP) was implemented to port the GPU acceleration version of the piecewise parabolic method (PPM) solver (GPU-HADVPPM) from NVIDIA GPUs to China's domestic GPU-like accelerators like GPU-HADVPPM4HIP. Further, it introduced the multi-level hybrid parallelism scheme to improve the total computational performance of the HIP version of the CAMx (Comprehensive Air Quality Model with Extensions; CAMx-HIP) model on China's domestic heterogeneous cluster. The experimental results show that the acceleration effect of GPU-HADVPPM on the different GPU accelerators is more apparent when the computing scale is more extensive, and the maximum speedup of GPU-HADVPPM on the domestic GPU-like accelerator is 28.9 × faster. The hybrid parallelism with a message passing interface (MPI) and HIP enables achieving up to a 17.2 × speedup when configuring 32 CPU cores and GPU-like accelerators on the domestic heterogeneous cluster. The OpenMP technology is introduced further to reduce the computation time of the CAMx-HIP model by 1.9 × . More importantly, by comparing the simulation results of GPU-HADVPPM on NVIDIA GPUs and domestic GPU-like accelerators, it is found that the simulation results of GPU-HADVPPM on domestic GPU-like accelerators have less difference than the NVIDIA GPUs. Furthermore, we also show that the data transfer efficiency between CPU and GPU has a meaningful essential impact on heterogeneous computing and point out that optimizing the data transfer efficiency between CPU and GPU is one of the critical directions to improve the computing efficiency of geoscience numerical models in heterogeneous clusters in the future.

Accelerating HPCG on Tianhe-2: A hybrid CPU-MIC algorithm

623 Tflop/s HPCG Run on Tianhe-2: Leveraging Millions of Hybrid Cores.

Heterogeneous Programming and Optimization of Gyrokinetic Toroidal Code and Large-Scale Performance Test on TH-1A.

A Hierarchical Grid Algorithm for Accelerating High-Performance Conjugate Gradient Benchmark on Sunway Many-Core Processor

HyGrid: A CPU-GPU Hybrid Convolution-Based Gridding Algorithm in Radio Astronomy.

Optimizing and Scaling HPCG on Tianhe-2: Early Experience

A CPU/MIC Collaborated Parallel Framework for GROMACS on Tianhe-2 Supercomputer

Performance optimizations for scalable CFD applications on hybrid CPU+MIC heterogeneous computing system with millions of cores

Reducing Communication Overhead in the High Performance Conjugate Gradient Benchmark on Tianhe-2

Enabling and Scaling the HPCG Benchmark on the Newest Generation Sunway Supercomputer with 42 Million Heterogeneous Cores

Ultra-Scalable CPU-MIC Acceleration of Mesoscale Atmospheric Modeling on Tianhe-2

Optimizing Multi-Grid Preconditioned Conjugate Gradient Method on Multi-Cores

Accelerating the Simulation of Thermal Convection in the Earth's Outer Core on Tianhe-2.

Manycore Parallel Computing for a Hybridizable Discontinuous Galerkin Nested Multigrid Method

Accelerated 3 D Full Band Self-consistent Ensemble Monte Carlo Device Simulation Utilizing Intel MIC Coprocessors on TianHe II

Multi-GPU Hybrid Programming Accelerated Three-Dimensional Phase-Field Model in Binary Alloy

An optimized large-scale hybrid DGEMM design for CPUs and ATI GPUs.

Performance Optimization of the HPCG Benchmark on the Sunway TaihuLight Supercomputer.

A Novel Multi-CPU/GPU Collaborative Computing Framework for SGD-based Matrix Factorization

GPU-HADVPPM4HIP V1.0: using the heterogeneous-compute interface for portability (HIP) to speed up the piecewise parabolic method in the CAMx (v6.10) air quality model on China's domestic GPU-like accelerator

Experience Of Parallelizing Cryo-Em 3d Reconstruction On A Cpu-Gpu Heterogeneous System