Abstract:The variety of today's multicore architectures motivates researchers to explore parallel scientific applications on different platforms. Load imbalance is one performance issue that can prejudice parallel applications from exploiting the computational power of these platforms. Ondes3D is a scientific application for seismic wave simulation used to assess the geological impact of earthquakes. Its parallelism relies on applying a regular domain decomposition in the geological domain provided and distributing each sub-domain to MPI ranks. Previous works investigate the significant spatial and temporal imbalance in Ondes3D and suggest new parallelization and load balancing techniques to minimize them. However, none explored its execution on different architectures. Our paper evaluates the performance of Ondes3D for two earthquake scenarios on eight different multicore architectures, including Intel, AMD, and ARM processors. We measure the load distribution per MPI rank, evaluate the temporal load imbalance, and compare the execution of the application's kernels. Our results show that the temporal load imbalance in Ondes3D depends on the architecture chosen, with some platforms minimizing such imbalance more effectively.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to study and evaluate the time - load imbalance problem of the Ondes3D seismic - wave simulator on different multi - core architectures. Specifically, the paper attempts to solve the following key problems: 1. **Evaluation of time - load imbalance**: - The paper evaluates the time - load imbalance of Ondes3D when executing two seismic scenarios on eight different multi - core architectures (including Intel, AMD, and ARM processors). - The researchers measured the load distribution of each MPI process and compared the execution of application kernels on different architectures. 2. **The influence of different architectures on load imbalance**: - The paper explores the specific influence of different multi - core architectures on the Ondes3D time - load imbalance and reveals which architectures can better minimize this imbalance. - The experimental results show that the time - load imbalance depends on the selected architecture, and some architectures perform better when dealing with specific seismic scenarios. 3. **Optimizing the selection of a suitable platform**: - An important goal of the research is to provide guidance for researchers to help them select the multi - core architecture that is most suitable for running Ondes3D in order to minimize load imbalance and improve performance. - The experimental results show that the AMD Zen 2 architecture performs best in terms of execution time and space load balancing, while the ARM ThunderX2 performs the worst. 4. **Micro - kernel analysis**: - The paper also conducts an in - depth analysis of the execution of different micro - kernels in Ondes3D, especially the CPML4 micro - kernel, to understand the performance differences on different architectures. - By comparing the assembly codes generated by different architectures, the researchers found that the instruction set of the ARM architecture is significantly different from that of other architectures, which may affect application performance. ### Main contributions - **Evaluating the performance of Ondes3D on different multi - core architectures**: A detailed performance evaluation was carried out on eight different architectures for two seismic scenarios. - **Evaluating the load distribution of each MPI process**: Analyzed the details of the load distribution and revealed the load imbalance situation under different architectures. - **Evaluating time - load imbalance**: By comparing the execution on different architectures, the specific characteristics of time - load imbalance were demonstrated. - **Evaluating the execution of the main computational kernels**: Individually evaluated the main computational kernels of the application and further refined the reasons for load imbalance. - **Proving that time - load imbalance depends on architecture selection**: The experimental results show that different architectures have a significant impact on time - load imbalance and point out the best architecture selection. Through these studies, the paper provides valuable references for researchers in the field of scientific computing, helping them make more informed decisions when selecting multi - core architectures to optimize application performance and minimize load imbalance.

Temporal Load Imbalance on Ondes3D Seismic Simulator for Different Multicore Architectures

Large Scale Numerical Simulation Via Parallelization and Reconfigurable Computing Hardware

Hybrid MPI+OpenMP Reactive Work Stealing in Distributed Memory in the PDE Framework sam(oa)^2

Spatial/Temporal Locality-based Load-sharing in Speculative Discrete Event Simulation on Multi-core Machines

Load Balancing For High Performance Computing Using Quantum Annealing

Criticality-Aware Dynamic Task Scheduling for Heterogeneous Architectures

Muchisim: A Simulation Framework for Design Exploration of Multi-Chip Manycore Systems

A Low Overhead Heterogeneous Parallel Optimization Method Based on 3-D Elastic Wave Numerical Simulation

Performance Analysis and Optimization of a Hybrid Distributed Reverse Time Migration Application

A Case Study on Addressing Complex Load Imbalance in OpenMP

A Low Overhead Heterogeneous Parallel Optimization Method Based on Three-Dimensional Elastic Wave Numerical Simulation

A Dynamic Data Partition Algorithm Oriented to MPI and OpenMP1

Revisiting Finite Difference and Spectral Migration Methods on Diverse Parallel Architectures

Exploring and analyzing the real impact of modern on-package memory on HPC scientific kernels

On the Galactic Evolution of $D$ and $^3He$

Dynamic Load Balancing of Multi-GPU Parallelization for VULCANO VE-U7 Corium Spreading Analysis Using SOPHIA

Scalability Study of Molecular Dynamics Simulation on Godson-T Many-Core Architecture

Topology-Aware Space-Shared Co-Analysis of Large-Scale Molecular Dynamics Simulations

Exploration of Performance and Energy Trade-offs for Heterogeneous Multicore Architectures

A performance analysis of a mimetic finite difference scheme for acoustic wave propagation on GPU platforms

Parallel seismic propagation simulation in anisotropic media by irregular grids finite difference method on PC cluster