Abstract:CitcomCu is a numerical simulation software for mantle convection in the field of geodynamics, which can simulate thermo-chemical convection in a three-dimensional domain. Due to the increasing demand for high-precision simulations and larger application scales, larger-scale computing systems are needed to solve this problem. However, the parallel efficiency of CitcomCu on large-scale heterogeneous parallel computing systems is difficult to improve, especially it cannot adapt to the current mainstream heterogeneous high-performance computing architecture with CPUs and accelerators. In this paper, we propose an geodynamics numerical simulation parallel computing framework using heterogeneous computing architecture based on the Tianhe new-generation high-performance computer. Firstly, the data partitioning mode of CitcomCu was optimized based on the large-scale heterogeneous computing architecture to reduce the overall communication overhead. Secondly, the iterative solution algorithm of CitcomCu was improved to speed up the solution process. Finally, the NEON instruction set based on SIMD is used for the sparse matrix operations in the solution process to improve parallel efficiency. Based on our parallel computing framework, the optimized CitcomCu was deployed and tested on the Tianhe new-generation high-performance computer. Experimental data showed that the performance of the optimized program was 3.3975 times higher than that of the unoptimized program on a single node. Compared with 50,000 computational cores, the parallel efficiency of the unoptimized program on one million computational cores was 36.75%, while the parallel efficiency of the optimized program was improved by 16.22% and reached 42.71%. In addition, the optimized program can be executed on 40 million computational cores, with a parallel efficiency of 36.54%.

Optimizing and Scaling HPCG on Tianhe-2: Early Experience

Heterogeneous Programming and Optimization of Gyrokinetic Toroidal Code and Large-Scale Performance Test on TH-1A.

623 Tflop/s HPCG Run on Tianhe-2: Leveraging Millions of Hybrid Cores.

Reducing Communication Overhead in the High Performance Conjugate Gradient Benchmark on Tianhe-2

Enabling and Scaling the HPCG Benchmark on the Newest Generation Sunway Supercomputer with 42 Million Heterogeneous Cores

Accelerating HPCG on Tianhe-2: A hybrid CPU-MIC algorithm

Performance Optimization of the HPCG Benchmark on the Sunway TaihuLight Supercomputer.

A Hierarchical Grid Algorithm for Accelerating High-Performance Conjugate Gradient Benchmark on Sunway Many-Core Processor

Performance Evaluation of HPGMG on Tianhe-2: Early Experience.

HyGrid: A CPU-GPU Hybrid Convolution-Based Gridding Algorithm in Radio Astronomy.

Performance optimizations for scalable CFD applications on hybrid CPU+MIC heterogeneous computing system with millions of cores

Optimizing Multi-Grid Preconditioned Conjugate Gradient Method on Multi-Cores

Parallel algorithm design and optimization of geodynamic numerical simulation application on the Tianhe new-generation high-performance computer

Large-Scale Parallelization and Optimization of Lattice QCD on Tianhe New Generation Supercomputer

A Two-Level Parallel Decomposition Approach for Transient Stability Constrained Optimal Power Flow

TianheGraph: Customizing Graph Search for Graph500 on Tianhe Supercomputer

CUDA-based PCG algorithm optimization for a large sparse matrix

A Hierarchical Tridiagonal System Solver for Heterogenous Supercomputers

Large-Scale Heterogeneous Computing for 3D Deterministic Particle Transport on Tianhe-2A Supercomputer

Parallel Transient Stability-Constrained Optimal Power Flow Using GPU as Coprocessor.

Enabling and Scaling a Global Shallow-Water Atmospheric Model on Tianhe-2