Abstract:GPU is the dominant accelerator device due to its high performance and energy efficiency. Directive-based GPU offloading using OpenACC or OpenMP target is a convenient way to port existing codes originally developed for multicore CPUs. Although OpenACC and OpenMP target provide similar features, both methods have pros and cons. OpenACC has better functions and an abundance of documents, but it is virtually for NVIDIA GPUs. OpenMP target supports NVIDIA/AMD/Intel GPUs but has fewer functions than OpenACC. Here, we have developed a header-only library, Solomon (Simple Off-LOading Macros Orchestrating multiple Notations), to unify the interface for GPU offloading with the support of both OpenACC and OpenMP target. Solomon provides three types of notations to reduce users' implementation and learning costs: intuitive notation for beginners and OpenACC/OpenMP-like notations for experienced developers. This manuscript denotes Solomon's implementation and usage and demonstrates the GPU-offloading in $N$-body simulation and the three-dimensional diffusion equation. The library and sample codes are provided as open-source software and publicly and freely available at \url{<a class="link-external link-https" href="https://github.com/ymiki-repo/solomon" rel="external noopener nofollow">this https URL</a>}.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve several key problems in GPU programming, as follows: 1. **Unified Interface**: Currently, when using OpenACC and OpenMP target for GPU offloading, developers need to write code in different styles according to different back - ends. This increases the cost of development and maintenance. The paper proposes a library named Solomon, which simplifies this process by providing a unified interface, enabling the same piece of code to run on NVIDIA, AMD, and Intel GPUs without significant modification. 2. **Reducing Vendor Lock - in**: Since OpenACC mainly supports NVIDIA GPUs, and although OpenMP target supports multiple GPUs, it has fewer functions, which leads to the problem of vendor lock - in. Solomon reduces dependence on specific vendors by supporting both OpenACC and OpenMP target simultaneously, improving the portability and flexibility of the code. 3. **Reducing Learning Cost**: For experienced developers, migrating from OpenACC to OpenMP target or vice versa requires additional learning costs. Solomon provides three types of annotation methods: intuitive annotation, OpenACC - style annotation, and OpenMP - style annotation to adapt to developers with different backgrounds, thereby reducing the learning cost. 4. **Simplifying Performance Comparison**: Solomon allows developers to easily switch between OpenACC and OpenMP target, thus conveniently comparing the performance of the two methods and helping to select the most suitable solution for a specific application scenario. ### Main contributions of the paper - **Unified Interface**: Solomon implements support for OpenACC and OpenMP target instructions, enabling the same piece of code to run on different brands of GPUs. - **Reducing Vendor Lock - in**: By supporting multiple GPU brands, it reduces dependence on specific vendors. - **Reducing Learning Cost**: It provides multiple annotation methods for developers with different backgrounds, reducing the difficulty of learning and using. - **Facilitating Performance Comparison**: It simplifies the switching between different programming models, facilitating performance evaluation. Through these improvements, Solomon not only improves the portability and ease - of - use of the code but also provides developers with more flexible choices to cope with the increasingly diverse GPU computing environment.

Unified schemes for directive-based GPU offloading

Implementation of the moving particle semi-implicit method for free-surface flows on GPU clusters.

Parallelized Implementation of the Finite Particle Method for Explicit Dynamics in GPU

GPU Implementation of a Sophisticated Implicit Low-Order Finite Element Solver with FP21-32-64 Computation Using OpenACC

Automatic BLAS Offloading on Unified Memory Architecture: A Study on NVIDIA Grace-Hopper

OpenACC acceleration of an unstructured CFD solver based on a reconstructed discontinuous Galerkin method for compressible flows

Massive parallelization and performance enhancement of an immersed boundary method based unsteady flow solver

Hybrid programming-model strategies for GPU offloading of electronic structure calculation kernels

Multi-GPU Performance Optimization of a CFD Code using OpenACC on Different Platforms

GPU-Acceleration of Parallel Unconditionally Stable Group Explicit Finite Difference Method

Portability and Scalability of OpenMP Offloading on State-of-the-art Accelerators

Proposal of Automatic Offloading Method in Mixed Offloading Destination Environment

Multiscale Universal Interface: A Concurrent Framework for Coupling Heterogeneous Solvers

Evaluating performance portability of five shared-memory programming models using a high-order unstructured CFD solver

OpenACC offloading of the MFC compressible multiphase flow solver on AMD and NVIDIA GPUs

Method for scalable and performant GPU-accelerated simulation of multiphase compressible flow

Efficient and portable acceleration of quantum chemical many-body methods in mixed floating point precision using OpenACC compiler directives

Taking GPU Programming Models to Task for Performance Portability

Optimization and acceleration of flow simulations for CFD on CPU/GPU architecture

Method for portable, scalable, and performant GPU-accelerated simulation of multiphase compressible flow

Particle-resolved thermal lattice Boltzmann simulation using OpenACC on multi-GPUs