Abstract:Heterogeneous nodes composed of a multicore CPU and accelerators are today's norm in high-performance computing (HPC) platforms due to their superior performance and energy efficiency. Tools such as OpenCL and hybrid combinations such as OpenMP plus OpenACC are used for developing portable parallel programs for such nodes. However, these tools have some drawbacks, including a lack of compiler support for nested parallelism, performance portability, automatic heterogeneous workload distribution, user-friendly thread placement, and processor affinity essential to the portable performance of hybrid programs executing on such nodes. In this paper, we propose OpenH, a novel programming model and library API for developing portable parallel programs on heterogeneous hybrid servers composed of a multicore CPU and one or more different types of accelerators. OpenH integrates Pthreads, OpenMP, and OpenACC seamlessly to facilitate the development of hybrid parallel programs. An OpenH hybrid parallel program starts as a single main thread, creating a group of Pthreads called hosting Pthreads. A hosting Pthread then leads the execution of a software component of the program, either an OpenMP multithreaded component running on the CPU cores or an OpenACC (or OpenMP) component running on one of the accelerators of the server. The OpenH library provides API functions that allow programmers to get the configuration of the executing environment and bind the hosting Pthreads (and hence the execution of components) of the program to the CPU cores of the hybrid server to get the best performance. We illustrate the OpenH programming model and library API using two hybrid parallel applications based on matrix multiplication and 2D fast Fourier transform for the most general case of a hybrid hyperthreaded server comprising computing devices. Finally, we demonstrate the practical performance and energy consumption of OpenH for the hybrid parallel matrix multiplication application on a server comprising an Intel Icelake multicore CPU and two Nvidia A40 GPUs.

PARRAY

Programming Heterogeneous Systems with Array Types

A parallel computing method for irregular work

Mcrpl: a General Purpose Parallel Raster Processing Library on Distributed Heterogeneous Architectures.

Accelerating Fortran Codes: A Method for Integrating Coarray Fortran with CUDA Fortran and OpenMP

AN PARALLEL AND DISTRIBUTED PROGRAMMING SOLUTION BASED ON HETEROGENEOUS GPU CLUSTER

Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit

Programming Framework for Node Heterogeneous GPU Cluster

OpenH: A Novel Programming Model and API for Developing Portable Parallel Programs on Heterogeneous Hybrid Servers

Reference Capabilities for Safe Parallel Array Programming

High Level Programming for Heterogeneous Architectures

Array-level Collective Communications

mdspan in C++: A Case Study in the Integration of Performance Portable Features into International Language Standards

Porting a sparse linear algebra math library to Intel GPUs

A Programming Framework Based on Multi-GPU

Mamba: Portable Array-based Abstractions for Heterogeneous High-Performance Systems

Design and implementation of self-adaptable parallel algorithms for scientific computing on highly heterogeneous HPC platforms

Towards Polytypic Parallel Programming

A Lightweight Approach to Performance Portability with targetDP

Programming FFT on DSM Multiprocessors

OpenArray V1.0: a Simple Operator Library for the Decoupling of Ocean Modeling and Parallel Computing