Unified schemes for directive-based GPU offloading

Yohei Miki,Toshihiro Hanawa
DOI: https://doi.org/10.1109/ACCESS.2024.3509380
2024-11-28
Abstract:GPU is the dominant accelerator device due to its high performance and energy efficiency. Directive-based GPU offloading using OpenACC or OpenMP target is a convenient way to port existing codes originally developed for multicore CPUs. Although OpenACC and OpenMP target provide similar features, both methods have pros and cons. OpenACC has better functions and an abundance of documents, but it is virtually for NVIDIA GPUs. OpenMP target supports NVIDIA/AMD/Intel GPUs but has fewer functions than OpenACC. Here, we have developed a header-only library, Solomon (Simple Off-LOading Macros Orchestrating multiple Notations), to unify the interface for GPU offloading with the support of both OpenACC and OpenMP target. Solomon provides three types of notations to reduce users' implementation and learning costs: intuitive notation for beginners and OpenACC/OpenMP-like notations for experienced developers. This manuscript denotes Solomon's implementation and usage and demonstrates the GPU-offloading in $N$-body simulation and the three-dimensional diffusion equation. The library and sample codes are provided as open-source software and publicly and freely available at \url{<a class="link-external link-https" href="https://github.com/ymiki-repo/solomon" rel="external noopener nofollow">this https URL</a>}.
Distributed, Parallel, and Cluster Computing,Instrumentation and Methods for Astrophysics,Performance,Programming Languages
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve several key problems in GPU programming, as follows: 1. **Unified Interface**: Currently, when using OpenACC and OpenMP target for GPU offloading, developers need to write code in different styles according to different back - ends. This increases the cost of development and maintenance. The paper proposes a library named Solomon, which simplifies this process by providing a unified interface, enabling the same piece of code to run on NVIDIA, AMD, and Intel GPUs without significant modification. 2. **Reducing Vendor Lock - in**: Since OpenACC mainly supports NVIDIA GPUs, and although OpenMP target supports multiple GPUs, it has fewer functions, which leads to the problem of vendor lock - in. Solomon reduces dependence on specific vendors by supporting both OpenACC and OpenMP target simultaneously, improving the portability and flexibility of the code. 3. **Reducing Learning Cost**: For experienced developers, migrating from OpenACC to OpenMP target or vice versa requires additional learning costs. Solomon provides three types of annotation methods: intuitive annotation, OpenACC - style annotation, and OpenMP - style annotation to adapt to developers with different backgrounds, thereby reducing the learning cost. 4. **Simplifying Performance Comparison**: Solomon allows developers to easily switch between OpenACC and OpenMP target, thus conveniently comparing the performance of the two methods and helping to select the most suitable solution for a specific application scenario. ### Main contributions of the paper - **Unified Interface**: Solomon implements support for OpenACC and OpenMP target instructions, enabling the same piece of code to run on different brands of GPUs. - **Reducing Vendor Lock - in**: By supporting multiple GPU brands, it reduces dependence on specific vendors. - **Reducing Learning Cost**: It provides multiple annotation methods for developers with different backgrounds, reducing the difficulty of learning and using. - **Facilitating Performance Comparison**: It simplifies the switching between different programming models, facilitating performance evaluation. Through these improvements, Solomon not only improves the portability and ease - of - use of the code but also provides developers with more flexible choices to cope with the increasingly diverse GPU computing environment.