Abstract:High-energy physics (HEP) experiments have developed millions of lines of code over decades that are optimized to run on traditional x86 CPU systems. However, we are seeing a rapidly increasing fraction of floating point computing power in leadership-class computing facilities and traditional data centers coming from new accelerator architectures, such as GPUs. HEP experiments are now faced with the untenable prospect of rewriting millions of lines of x86 CPU code, for the increasingly dominant architectures found in these computational accelerators. This task is made more challenging by the architecture-specific languages and APIs promoted by manufacturers such as NVIDIA, Intel and AMD. Producing multiple, architecture-specific implementations is not a viable scenario, given the available person power and code maintenance issues. The Portable Parallelization Strategies team of the HEP Center for Computational Excellence is investigating the use of Kokkos, SYCL, OpenMP, std::execution::parallel and alpaka as potential portability solutions that promise to execute on multiple architectures from the same source code, using representative use cases from major HEP experiments, including the DUNE experiment of the Long Baseline Neutrino Facility, and the ATLAS and CMS experiments of the Large Hadron Collider. This cross-cutting evaluation of portability solutions using real applications will help inform and guide the HEP community when choosing their software and hardware suites for the next generation of experimental frameworks. We present the outcomes of our studies, including performance metrics, porting challenges, API evaluations, and build system integration.

What problem does this paper attempt to address?

The paper primarily discusses the challenges faced by high-energy physics (HEP) experiments when confronted with a shift in computational resource architectures and investigates various portable parallel strategies to address these challenges. Specifically, as the proportion of floating-point computational power provided by new accelerator architectures (such as GPUs) in high-performance computing facilities and traditional data centers increases rapidly, HEP experiments are faced with the need to rewrite millions of lines of code originally optimized for x86 CPUs. This is not only a massive workload but also becomes impractical due to the different architecture-specific languages and APIs promoted by various manufacturers (such as NVIDIA, Intel, and AMD). To tackle this issue, the portable parallel strategies team at the HEP Center of Computational Excellence evaluated several potential portable solutions, including Kokkos, SYCL, OpenMP, std::execution::parallel, and alpaka, which allow execution from the same source code on multiple different hardware architectures. Through representative case studies from large HEP experiments (including DUNE, ATLAS, and CMS experiments), these evaluations aim to help guide the HEP community in decision-making for future experimental framework software and hardware choices. The paper details the selected portable layers, representative test cases, evaluation metrics considered, and performance assessment results of different portable layers. By conducting cross-domain evaluations of these portable layers, guidance can be provided to the HEP community regarding the selection of future software and hardware suites. In summary, this paper attempts to address how to efficiently migrate the existing HEP experimental codebase to new heterogeneous computing architectures to meet the needs of future experiments while maintaining the maintainability and efficiency of the code.

Evaluating Portable Parallelization Strategies for Heterogeneous Architectures in High Energy Physics

Exploring code portability solutions for HEP with a particle tracking test code

Application of performance portability solutions for GPUs and many-core CPUs to track reconstruction kernels

Portability: A Necessary Approach for Future Scientific Software

Portable Programming Model Exploration for LArTPC Simulation in a Heterogeneous Computing Environment: OpenMP vs. SYCL

A Study of Performance Portability in Plasma Physics Simulations

A Lightweight Approach to Performance Portability with targetDP

Evaluating performance and portability of high-level programming models: Julia, Python/Numba, and Kokkos on exascale nodes

Evaluation of Portable Acceleration Solutions for LArTPC Simulation Using Wire-Cell Toolkit

Rapid Exploration of Optimization Strategies on Advanced Architectures using TestSNAP and LAMMPS

Evaluation of Portable Programming Models to Accelerate LArTPC Detector Simulations

Asynchronous-Many-Task Systems: Challenges and Opportunities -- Scaling an AMR Astrophysics Code on Exascale machines using Kokkos and HPX

Performance-Portable Many-Core Plasma Simulations: Porting PIConGPU to OpenPower and Beyond

Studying performance portability of LAMMPS across diverse GPU‐based platforms

Taking GPU Programming Models to Task for Performance Portability

Portability for GPU-accelerated molecular docking applications for cloud and HPC: can portable compiler directives provide performance across all platforms?

Portability and Scalability of OpenMP Offloading on State-of-the-art Accelerators

Evaluation of performance portability frameworks for the implementation of a particle-in-cell code

Implementing Performance Portability of High Performance Computing Programs in the New Golden Age of Chip Architecture

Design and optimization of a portable LQCD Monte Carlo code using OpenACC

Application Experiences on a GPU-Accelerated Arm-based HPC Testbed