OpenMP offload toward the exascale using Intel® GPU Max 1550: evaluation of STREAmS compressible solver
Salvadore, Francesco,Rossi, Giacomo,Bernardini, Matteo
DOI: https://doi.org/10.1007/s11227-024-06254-y
IF: 3.3
2024-06-07
The Journal of Supercomputing
Abstract:Nearly 20 years after the birth of general-purpose GPU computing, the HPC landscape is now dominated by GPUs. After years of undisputed dominance by NVIDIA, new players have entered the arena in a convincing manner, namely AMD and more recently Intel, whose devices currently power the first two clusters in the Top500 ranking. Unfortunately, code porting is still a major problem, even more due to the presence of different vendors, but at the same time the emergence of simplified standard paradigms suggests an encouraging prospect for developers. In this work, we provide a detailed OpenMP porting strategy of STREAmS, a community code for the compressible fluid dynamics. The proposed porting technique is based on the offload functionality of the OpenMP 5.x paradigm and in particular on a hybrid directives/APIs approach that fits seamlessly into the multi-backend software ecosystem of STREAmS. We further carry out a comprehensive performance analysis on the Intel® Data Center GPU Max 1550 (formerly called Ponte Vecchio or PVC). In addition, we analyze the performance of the code on two benchmark clusters powered by PVC, including the exascale Aurora cluster. The performance is evaluated at different levels of parallelism involved, i.e., the intrinsic parallelism of the PVC tile, the inter-tile parallelism within the GPU configuration, between the GPUs within the node and between the nodes within the cluster. The analysis shows that although the implementation complexity of the OpenMP porting is limited, it is necessary to follow some important guidelines to achieve satisfactory performance. The PVC GPU shows about 40% higher performance than the NVIDIA A100 or AMD MI250X GPUs, which, however, were released about 3 years earlier. Both intra-node and internode scalability show good results. Overall, the introduction of PVC into the GPU computing HPC landscape represents a positive step forward for the diversification and competitiveness of the sector.
computer science, theory & methods,engineering, electrical & electronic, hardware & architecture