A Further Study of Linux Kernel Hugepages on A64FX with FLASH, an Astrophysical Simulation Code
Catherine Feldman,Smeet Chheda,Alan C. Calder,Eva Siegmann,John Dey,Tony Curtis,Robert J. Harrison
DOI: https://doi.org/10.1145/3569951.3597583
2023-09-09
Abstract:We present an expanded study of the performance of FLASH when using Linux Kernel Hugepages on Ookami, an HPE Apollo 80 A64FX platform. FLASH is a multi-scale, multi-physics simulation code written principally in modern Fortran and makes use of the PARAMESH library to manage a block-structured adaptive mesh. Our initial study used only the Fujitsu compiler to utilize standard hugepages (hp), but further investigation allowed us to utilize hp for multiple compilers by linking to the Fujitsu library libmpg and transparent hugepages (thp) by enabling it at the node level. By comparing the results of hardware counters and in-code timers, we found that hp and thp do not significantly impact the runtime performance of FLASH. Interestingly, there is a significant reduction in the TLB misses, differences in cache and memory access counters, and strange behavior is observed when using thp.
Distributed, Parallel, and Cluster Computing,Performance