Abstract:As an important goal of high-performance computing, the concept of performance portability has been around for many years. As the failure of Moore's Law, it is no longer feasible to improve computer performance by simply increasing the number of existing hardware. The innovation of high performance computer is imperative, which makes high-performance computers with multiple architectures coexist in the production environment. For example, current high-performance computing nodes often use co-accelerators such like general-purpose GPUs and Intel Xeon Phis to accelerate general-purpose processors. With the flourishing of deep learning, dedicated neural network acceleration chips are also arising. The emergence of co-accelerators with different architectures and their wide application in high-performance computers have challenged the performance portability of programs between high-performance computers with different architectures. This article summarizes the current performance portability technology from the programming model, serial code automatic parallelization, parallel code automatic conversion, etc. at the end of the article, it also summarizes how to use scientific computing function libraries to improve performance and performance portability of a program. Different application scenarios need different implementation technologies to get performance portability. Program developers choose performance portability solutions for their programs. In fact, they balance programming efficiency and optimization effects under various constraints.

What problem does this paper attempt to address?

The paper primarily explores methods and techniques for achieving program performance portability in the field of high-performance computing. With the diversification of computing hardware architectures, especially the prevalence of heterogeneous computing systems (combinations of general-purpose processors and coprocessors), ensuring that programs can run efficiently on different architectural computing platforms has become a significant challenge. The paper discusses the following aspects: 1. **Background Introduction**: Introduces the connection between Zhuangzi's philosophical thought "externalization without internalization, riding on things to roam the mind" and the field of high-performance computing, i.e., algorithms or programs should be able to automatically adapt and perform optimally in different system architectures. 2. **Parallel Programming Models**: Discusses parallel programming models that meet performance portability, including models based on programming interfaces, compiler directives, and programming languages. These models aim to help developers write programs that can adapt to various computing devices. 3. **Automatic Parallelization of Serial Code**: Introduces techniques such as polyhedral compilation and genetic programming to achieve automatic conversion from serial code to parallel code, thereby improving code performance portability. 4. **Automatic Conversion Tools for Parallel Code**: Discusses tools for converting parallel code targeted at one architecture to parallel code for other architectures, such as tools for converting from OpenMP to CUDA or from CUDA to OpenMP. 5. **Other Aspects**: Mentions methods for improving program performance by reasonably calling scientific computing library functions. In summary, the core issue this paper attempts to address is: how to achieve performance portability of high-performance computing programs in a diversified computing hardware environment. Through the aforementioned technical means at different levels, the paper explores multiple solutions from programming model design to automatic code conversion, aiming to help developers overcome the challenges brought by hardware architecture diversity and improve the cross-platform compatibility and efficiency of programs.

Implementing Performance Portability of High Performance Computing Programs in the New Golden Age of Chip Architecture

An approach to performance portability through generic programming

A Study of Performance Programming of CPU, GPU accelerated Computers and SIMD Architecture

Portability: A Necessary Approach for Future Scientific Software

Taking GPU Programming Models to Task for Performance Portability

High-performance computing: Transitioning from Instruction-Level Parallelism to heterogeneous hybrid architectures

Performance engineering problem in high performance computing

Unified Programming Models for Heterogeneous High-Performance Computers.

Application of performance portability solutions for GPUs and many-core CPUs to track reconstruction kernels

A Lightweight Approach to Performance Portability with targetDP

High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs

Toward Open Repository of Performance Portability of Applications, Benchmarks and Models

On the Parallelization Optimization Strategy for High Performance Computing Software

Evaluating Portable Parallelization Strategies for Heterogeneous Architectures in High Energy Physics

Performance Evaluation of Hybrid Programming Patterns for Large CPU/GPU Heterogeneous Clusters.

Improving Performance Portability for GPU-specific OpenCL Kernels on Multi-Core/many-core CPUs by Analysis-Based Transformations

Mapcg: Writing Parallel Program Portable Between Cpu And Gpu

Performance on HPC Platforms Is Possible Without C++

Evaluating performance portability of five shared-memory programming models using a high-order unstructured CFD solver

Runtime Support for Performance Portability on Heterogeneous Distributed Platforms

Performance Portability of Sparse Block Diagonal Matrix Multiple Vector Multiplications on GPUs