Hybrid MPI+OpenMP Reactive Work Stealing in Distributed Memory in the PDE Framework sam(oa)^2

Philipp Samfass,Jannis Klinkenberg,Michael Bader
DOI: https://doi.org/10.1109/cluster.2018.00051
2018-09-01
Abstract:“Equal work results in equal execution time” is an assumption that has fundamentally driven design and implementation of parallel applications for decades. However, increasing hardware variability on current architectures (e.g., through Turbo Boost, dynamic voltage and frequency scaling or thermal effects) necessitate a revision of this assumption. Expecting an increase of these effects on future (exascale-)systems, we develop a novel MPI+OpenMP-only distributed work stealing concept that – based on on-line performance monitoring – selectively steals and remotely executes tasks across MPI boundaries. This concept has been implemented in the parallel adaptive mesh refinement (AMR) framework sam(oa)2 for OpenMP tasks of traversing a grid section. Corresponding performance measurements in the presence of enforced CPU clock frequency imbalances demonstrate that a state-of-the-art cost-based (chains-on-chains partitioning) load balancing mechanism is insufficient and can even degrade performance, whereas additional distributed work stealing successfully mitigates the frequency-induced imbalances. Furthermore, our results indicate that our approach is also suitable for load balancing work-induced imbalances in a realistic AMR test case.
What problem does this paper attempt to address?