Abstract:Quick and accurate solution of the multi-group neutron diffusion equation is very important in the numerical calculation of reactor physics. Among the numerical methods, finite difference method is a simple method to get accurate results and is easy to program. Fine mesh is required in the method, which will lead to long time consumption for computing. Parallel computing on parallel platforms such as supercomputers is an effective approach to reduce the computing time. Advanced supercomputers like Sunway TaihuLight have heterogeneous structures. Similar structure will be widely used by the E-level supercomputers in the future. However, parallel programming on heterogeneous structures is more complex and there is little software research on reactor physics on Sunway TaihuLight at present. In this paper, a parallel program for solving the fine-mesh neutron diffusion equation based on finite difference method on Sunway TaihuLight was finished. A two-level parallel mode is used in the program: process-level parallelization between core groups by Message Passing Interface (MPI) and thread-level parallelization in each single core group. During the thread-level parallelization, two parallel programming interfaces specially designed for Sunway TaihuLight, OpenAcc* , a high-level method, and Athread, a low-level method, were tried respectively for comparison. The IAEA static 3-D PWR benchmark problem in 2-group case was used to verify the program’s results and test the parallel performance relative to the serial performance of Sunway processor. The calculation results are proved to be correct in comparison with reference results. It is showed that Athread has better performance than OpenAcc* for the complicated iterations in finite difference method. For the mesh scale of 170×170×380, the speedup ratio is 12.907 on a single core group with Athread. For the process-level parallelization, the program’s speedup ratio can reach 201.322 on 64 core groups at present. The efficiency of Computing Processing Elements (CPEs) is found to decrease with the increase of CPEs. The program is proved to be highly parallelizable and performance-stable when higher accuracy is required.

5 ExaFlop/s HPL-MxP Benchmark with Linear Scalability on the 40-Million-core Sunway Supercomputer

Analysis and MPI Implementation of LQCD Dslash on Sunway TaihuLight*

Enabling and Scaling the HPCG Benchmark on the Newest Generation Sunway Supercomputer with 42 Million Heterogeneous Cores

Automatic Multi-Parameter Performance Modeling of HPC Applications on a New Sunway Supercomputer

swHPFM: Refactoring and Optimizing the Structured Grid Fluid Mechanical Algorithm on the Sunway TaihuLight Supercomputer

Heterogeneous Parallel Algorithm Design and Performance Optimization for WENO on the Sunway TaihuLight Supercomputer

The Sunway TaihuLight supercomputer: system and applications

Accelerating Large-Scale CFD Simulations with Lattice Boltzmann Method on a 40-Million-core Sunway Supercomputer.

Enabling High-Performance Physical Based Rendering on New Sunway Supercomputer

Sunway TaihuLight supercomputer makes its appearance

Massively Scaling Seismic Processing on Sunway TaihuLight Supercomputer

PARALLEL SOLUTION OF FINE-MESH NEUTRON DIFFUSION EQUATION ON HETEROGENEOUS STRUCTURE OF SUNWAY TAIHULIGHT SUPERCOMPUTER

Towards Efficient SpMV on Sunway Manycore Architectures.

Redesigning LAMMPS for Peta-Scale and Hundred-Billion-atom Simulation on Sunway TaihuLight.

Superblock-based performance optimization for Sunway Math Library on SW26010 many-core processor

A Hierarchical Grid Algorithm for Accelerating High-Performance Conjugate Gradient Benchmark on Sunway Many-Core Processor

Performance Optimization of the HPCG Benchmark on the Sunway TaihuLight Supercomputer.

O2ATH: an OpenMP Offloading Toolkit for the Sunway Heterogeneous Manycore Platform

Accelerating and Tuning Small Matrix Multiplications on Sunway TaihuLight: A Case Study of Spectral Element CFD Code Nek5000

26 PFLOPS Stencil Computations for Atmospheric Modeling on Sunway TaihuLight.

Full Lifecycle Data Analysis on a Large-scale and Leadership Supercomputer: What Can We Learn from It?