Detrimental task execution patterns in mainstream OpenMP runtimes

Adam S. Tuft,Tobias Weinzierl,Michael Klemm
2024-07-26
Abstract:The OpenMP API offers both task-based and data-parallel concepts to scientific computing. While it provides descriptive and prescriptive annotations, it is in many places deliberately unspecific how to implement its annotations. As the predominant OpenMP implementations share design rationales, they introduce "quasi-standards how certain annotations behave. By means of a task-based astrophysical simulation code, we highlight situations where this "quasi-standard" reference behaviour introduces performance flaws. Therefore, we propose prescriptive clauses to constrain the OpenMP implementations. Simulated task traces uncover the clauses' potential, while a discussion of their realization highlights that they would manifest in rather incremental changes to any OpenMP runtime supporting task priorities.
Programming Languages,Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
This paper aims to address issues in the mainstream OpenMP runtime when executing task modes. Specifically: 1. **Task Generation Guarantee**: The paper points out that OpenMP's task creation mechanism lacks clarity, leading to tasks potentially being executed immediately or delayed. This uncertainty can cause performance bottlenecks in certain situations, especially when tasks on the critical path cannot be delayed. The paper suggests introducing new API extensions to enforce delayed execution of tasks, thereby optimizing task scheduling. 2. **Nested Parallelism**: The existing OpenMP specification does not support nested parallelism within tasks, which limits the concurrency capabilities in the code. For example, in the ExaHyPE framework, certain expensive operations (such as interpolation and restriction) cannot be executed in parallel. The paper proposes improvements to the OpenMP API to allow nested parallel execution within tasks, thereby reducing the application's execution time. 3. **Fair Scheduling**: The paper discusses the issue of unfairness when using the `taskyield` directive in a multithreaded environment, which may lead to some tasks being starved. The authors suggest introducing a fair scheduling mechanism to ensure that all tasks receive a reasonable opportunity for scheduling. 4. **Synchronization Semantics**: The paper analyzes the different ways in which the `taskwait` and `taskloop` directives implement synchronization and points out that these synchronization points may introduce additional algorithmic delays. To improve this, the paper proposes enhancements to these directives to better control synchronization behavior between tasks, thereby optimizing performance. In summary, this paper mainly explores how to improve task execution modes by modifying the OpenMP API to enhance the performance of parallel applications.