Implementing Performance Portability of High Performance Computing Programs in the New Golden Age of Chip Architecture

Weifeng Liu,Linping Wu,Xiaowen Xu,Yuren Wang
2023-08-26
Abstract:As an important goal of high-performance computing, the concept of performance portability has been around for many years. As the failure of Moore's Law, it is no longer feasible to improve computer performance by simply increasing the number of existing hardware. The innovation of high performance computer is imperative, which makes high-performance computers with multiple architectures coexist in the production environment. For example, current high-performance computing nodes often use co-accelerators such like general-purpose GPUs and Intel Xeon Phis to accelerate general-purpose processors. With the flourishing of deep learning, dedicated neural network acceleration chips are also arising. The emergence of co-accelerators with different architectures and their wide application in high-performance computers have challenged the performance portability of programs between high-performance computers with different architectures. This article summarizes the current performance portability technology from the programming model, serial code automatic parallelization, parallel code automatic conversion, etc. at the end of the article, it also summarizes how to use scientific computing function libraries to improve performance and performance portability of a program. Different application scenarios need different implementation technologies to get performance portability. Program developers choose performance portability solutions for their programs. In fact, they balance programming efficiency and optimization effects under various constraints.
Hardware Architecture
What problem does this paper attempt to address?
The paper primarily explores methods and techniques for achieving program performance portability in the field of high-performance computing. With the diversification of computing hardware architectures, especially the prevalence of heterogeneous computing systems (combinations of general-purpose processors and coprocessors), ensuring that programs can run efficiently on different architectural computing platforms has become a significant challenge. The paper discusses the following aspects: 1. **Background Introduction**: Introduces the connection between Zhuangzi's philosophical thought "externalization without internalization, riding on things to roam the mind" and the field of high-performance computing, i.e., algorithms or programs should be able to automatically adapt and perform optimally in different system architectures. 2. **Parallel Programming Models**: Discusses parallel programming models that meet performance portability, including models based on programming interfaces, compiler directives, and programming languages. These models aim to help developers write programs that can adapt to various computing devices. 3. **Automatic Parallelization of Serial Code**: Introduces techniques such as polyhedral compilation and genetic programming to achieve automatic conversion from serial code to parallel code, thereby improving code performance portability. 4. **Automatic Conversion Tools for Parallel Code**: Discusses tools for converting parallel code targeted at one architecture to parallel code for other architectures, such as tools for converting from OpenMP to CUDA or from CUDA to OpenMP. 5. **Other Aspects**: Mentions methods for improving program performance by reasonably calling scientific computing library functions. In summary, the core issue this paper attempts to address is: how to achieve performance portability of high-performance computing programs in a diversified computing hardware environment. Through the aforementioned technical means at different levels, the paper explores multiple solutions from programming model design to automatic code conversion, aiming to help developers overcome the challenges brought by hardware architecture diversity and improve the cross-platform compatibility and efficiency of programs.