Temporal Load Imbalance on Ondes3D Seismic Simulator for Different Multicore Architectures

Ana Luisa Veroneze Solórzano,Philippe Olivier Alexandre Navaux,Lucas Mello Schnorr
2024-09-18
Abstract:The variety of today's multicore architectures motivates researchers to explore parallel scientific applications on different platforms. Load imbalance is one performance issue that can prejudice parallel applications from exploiting the computational power of these platforms. Ondes3D is a scientific application for seismic wave simulation used to assess the geological impact of earthquakes. Its parallelism relies on applying a regular domain decomposition in the geological domain provided and distributing each sub-domain to MPI ranks. Previous works investigate the significant spatial and temporal imbalance in Ondes3D and suggest new parallelization and load balancing techniques to minimize them. However, none explored its execution on different architectures. Our paper evaluates the performance of Ondes3D for two earthquake scenarios on eight different multicore architectures, including Intel, AMD, and ARM processors. We measure the load distribution per MPI rank, evaluate the temporal load imbalance, and compare the execution of the application's kernels. Our results show that the temporal load imbalance in Ondes3D depends on the architecture chosen, with some platforms minimizing such imbalance more effectively.
Performance,Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to study and evaluate the time - load imbalance problem of the Ondes3D seismic - wave simulator on different multi - core architectures. Specifically, the paper attempts to solve the following key problems: 1. **Evaluation of time - load imbalance**: - The paper evaluates the time - load imbalance of Ondes3D when executing two seismic scenarios on eight different multi - core architectures (including Intel, AMD, and ARM processors). - The researchers measured the load distribution of each MPI process and compared the execution of application kernels on different architectures. 2. **The influence of different architectures on load imbalance**: - The paper explores the specific influence of different multi - core architectures on the Ondes3D time - load imbalance and reveals which architectures can better minimize this imbalance. - The experimental results show that the time - load imbalance depends on the selected architecture, and some architectures perform better when dealing with specific seismic scenarios. 3. **Optimizing the selection of a suitable platform**: - An important goal of the research is to provide guidance for researchers to help them select the multi - core architecture that is most suitable for running Ondes3D in order to minimize load imbalance and improve performance. - The experimental results show that the AMD Zen 2 architecture performs best in terms of execution time and space load balancing, while the ARM ThunderX2 performs the worst. 4. **Micro - kernel analysis**: - The paper also conducts an in - depth analysis of the execution of different micro - kernels in Ondes3D, especially the CPML4 micro - kernel, to understand the performance differences on different architectures. - By comparing the assembly codes generated by different architectures, the researchers found that the instruction set of the ARM architecture is significantly different from that of other architectures, which may affect application performance. ### Main contributions - **Evaluating the performance of Ondes3D on different multi - core architectures**: A detailed performance evaluation was carried out on eight different architectures for two seismic scenarios. - **Evaluating the load distribution of each MPI process**: Analyzed the details of the load distribution and revealed the load imbalance situation under different architectures. - **Evaluating time - load imbalance**: By comparing the execution on different architectures, the specific characteristics of time - load imbalance were demonstrated. - **Evaluating the execution of the main computational kernels**: Individually evaluated the main computational kernels of the application and further refined the reasons for load imbalance. - **Proving that time - load imbalance depends on architecture selection**: The experimental results show that different architectures have a significant impact on time - load imbalance and point out the best architecture selection. Through these studies, the paper provides valuable references for researchers in the field of scientific computing, helping them make more informed decisions when selecting multi - core architectures to optimize application performance and minimize load imbalance.