Adapting arepo-rt for Exascale Computing: GPU Acceleration and Efficient Communication

Oliver Zier,Rahul Kannan,Aaron Smith,Mark Vogelsberger,Erkin Verbeek
DOI: https://doi.org/10.1093/mnras/stae1837
IF: 4.8
2024-07-29
Monthly Notices of the Royal Astronomical Society
Abstract:Abstract Radiative transfer (RT) is a crucial ingredient for self-consistent modelling of numerous astrophysical phenomena across cosmic history. However, on-the-fly integration into radiation-hydrodynamics (RHD) simulations is computationally demanding, particularly due to the stringent time-stepping conditions and increased dimensionality inherent in multi-frequency collisionless Boltzmann physics. The emergence of exascale supercomputers, equipped with extensive CPU cores and GPU accelerators, offers new opportunities for enhancing RHD simulations. We present the first steps towards optimizing AREPO-RT for such high-performance computing environments. We implement a novel node-to-node communication strategy that utilizes shared memory to substitute intra-node communication with direct memory access. Furthermore, combining multiple inter-node messages into a single message substantially enhances network bandwidth utilization and performance for large-scale simulations on modern supercomputers. The single-message node-to-node approach also improves performance on smaller-scale machines with less optimized networks. Furthermore, by transitioning all RT-related calculations to GPUs, we achieve a significant computational speedup of around 15 for standard benchmarks compared to the original CPU implementation. As a case study, we perform cosmological RHD simulations of the Epoch of Reionization, employing a similar setup as the THESAN project. In this context, RT becomes sub-dominant such that even without modifying the core AREPO codebase, there is an overall threefold improvement in efficiency. The advancements presented here have broad implications, potentially transforming the complexity and scalability of future simulations for a wide variety of astrophysical studies. Our work serves as a blueprint for porting similar simulation codes based on unstructured resolution elements to GPU-centric architectures.
astronomy & astrophysics
What problem does this paper attempt to address?