Abstract:Heterogeneous computing has been developing continuously in the field of high-performance computing because of its high performance and energy efficiency. More and more accelerators have emerged, such as GPU, FPGA, DSP, AI accelerator, and so on. Usually, the accelerator is connected to the host CPU as a peripheral device to form a tightly coupled heterogeneous computing node, and then, a parallel system is constructed by multiple nodes. This organization is computationally efficient, but not flexible. When new accelerators appear, it is difficult to join the system that has been built. At the hardware level, we create an array of accelerators and connect them to the existing system through a high-speed network. At the software level, we dynamically organize computing resources from various arrays to build a virtual heterogeneous computing node. This approach also includes a standard programming environment. Therefore, it is a more flexible, elastic, and scalable heterogeneous computing organization. In this paper, a supernode OpenCL implementation is proposed for hybrid parallel computing systems, in which virtual supernodes can be dynamically constructed between different computing arrays, and a standard OpenCL environment is implemented based on RDMA communication of high-speed interconnection, which can be combined with the system-level MPI programming environment, thereby realizing the large-scale parallel computing of the hybrid array. SNCL is compatible with existing MPI/OpenCL programs without the need for additional modifications. Experiments show that the runtime overhead of the supernode OpenCL environment is very low, and it is suitable for deploying applications with high computing density and large data scale between different arrays to utilize their computing power without affecting scalability.

Communication‐hiding Programming for Clusters with Multi‐coprocessor Nodes

Utilizing Multiple Xeon Phi Coprocessors on One Compute Node.

Cluster-level tuning of a shallow water equation solver on the Intel MIC architecture

SNCL: a Supernode OpenCL Implementation for Hybrid Computing Arrays

Programming Framework for Node Heterogeneous GPU Cluster

Experimentation Procedure for Offloaded Mini-Apps Executed on Cluster Architectures with Xeon Phi Accelerators

Hybrid Parallel Programming Model for Hierarchical NoC

A New Hybrid Hierarchical Parallel Algorithm to Enhance the Performance of Large-Scale Structural Analysis Based on Heterogeneous Multicore Clusters

HeteroPP: A directive‐based heterogeneous cooperative parallel programming framework

Accelerating Communication for Parallel Programming Models on GPU Systems

A Systemic Strategy for Tuning Intra-node Collective Communication on Multicore Systems

A Parallel Programming Interface for Out-of-core Cluster Applications

NUMERICAL SIMULATION OF PLANETARY FLUID DYNAMICS ON CPU-MIC HETEROGENEOUS MANY-CORE SYSTEMS

AN PARALLEL AND DISTRIBUTED PROGRAMMING SOLUTION BASED ON HETEROGENEOUS GPU CLUSTER

MILC Code Performance on High End CPU and GPU Supercomputer Clusters

FT-Offload: A Scalable Fault-Tolerance Programing Model on MIC Cluster

Energy-Efficient Hardware-Accelerated Synchronization for Shared-L1-Memory Multiprocessor Clusters

OpenCL + OpenSHMEM Hybrid Programming Model for the Adapteva Epiphany Architecture

Deep and Shallow convections in Atmosphere Models on Intel Xeon Phi Coprocessor Systems

A Parallel Programming Environment with Domain Components for Cluster Computing