Paul T. Bauman,Reuben D. Budiardja,Dmytro Bykov,Noel Chalmers,Jacqueline Chen,Nicholas Curtis,Marc Day,Markus Eisenbach,Lucas Esclapez,Alessandro Fanfarillo,William Freitag,Nicholas Frontiere,Antigoni Georgiadou,Joseph Glenski,Kalyana Gottiparthi,Marc T. Henry de Frahan,Gustav R. Jansen,Wayne Joubert,Justin G. Lietz,Jakub Kurzak,Nicholas Malaya,Bronson Messer,Damon McDougall,Paul Mullowney,Stephen Nichols,Matthew Norman,Thomas Papatheodore,Jon Rood,Philip C. Roth,Sarat Sreepathi,James White III,Noah Wolfe

Abstract:The advent of exascale computing invites an assessment of existing best practices for developing application readiness on the world's largest supercomputers. This work details observations from the last four years in preparing scientific applications to run on the Oak Ridge Leadership Computing Facility's (OLCF) Frontier system. This paper addresses a range of topics in software including programmability, tuning, and portability considerations that are key to moving applications from existing systems to future installations. A set of representative workloads provides case studies for general system and software testing. We evaluate the use of early access systems for development across several generations of hardware. Finally, we discuss how best practices were identified and disseminated to the community through a wide range of activities including user-guides and trainings. We conclude with recommendations for ensuring application readiness on future leadership computing systems.

What problem does this paper attempt to address?

The paper primarily explores the application readiness process and best practices for Exascale (10^18 operations per second) supercomputers. Specifically, the paper focuses on the Oak Ridge Leadership Computing Facility (OLCF)'s Frontier system, a supercomputer built in collaboration with AMD and HPE, designed to achieve sustained double-precision floating-point performance exceeding 1 Exaflop. The main objectives of the paper include: 1. **Evaluating the readiness of existing applications on Exascale systems**: Considering that when the Frontier project was announced in 2019, almost no applications could fully utilize Exascale-level computing power, it is necessary to evaluate and adjust existing software development best practices. 2. **Detailing the experience of preparing scientific applications over the past 4 years**: This includes considerations in software programming, tuning, and porting, which are critical factors in migrating applications from existing systems to future installations. 3. **Providing a series of representative workloads as general system and software test case studies**: These case studies help understand how to better adapt to new hardware environments. 4. **Evaluating the development use of early access systems**: The paper discusses development across several generations of hardware. 5. **Sharing methods for identifying and disseminating best practices**: This knowledge is conveyed to the community through user guides and training. 6. **Offering recommendations to ensure application readiness for future leadership-class computing systems**: Based on the above experiences, the paper provides recommended practices to ensure applications can run smoothly on the next generation of supercomputers. The paper also specifically mentions an organization called the Frontier Center of Excellence (COE), which brings together key personnel from HPE, AMD, and ORNL to serve as a center of knowledge and expertise on application readiness and optimization, and as a focal point for application "co-design," coordinating efforts across different domains. Additionally, the paper details work in software testing and preparation, including programming strategies such as AMD's HIP (Heterogeneous-compute Interface for Portability) and OpenMP target offloading, as well as specific optimization cases for multiple scientific applications.

Experiences Readying Applications for Exascale

Early experiences on the OLCF Frontier system with AthenaPK and Parthenon‐Hydro

Deploying Optimized Scientific and Engineering Applications on Exascale Systems

Scaling on Frontier: Uncertainty Quantification Workflow Applications using ExaWorks to Enable Full System Utilization

Early experiences evaluating the HPE/Cray ecosystem for AMD GPUs

Application-Driven Exascale: The JUPITER Benchmark Suite

Exascale Workflow Applications and Middleware: An ExaWorks Retrospective

Ookami: Deployment and Initial Experiences

ECP libraries and tools: An overview

Application Experiences on a GPU-Accelerated Arm-based HPC Testbed

Exascale Computational Fluid Dynamics in Heterogeneous Systems

High Performance Optimization at the Door of the Exascale

Creating Continuous Integration Infrastructure for Software Development on DOE HPC Systems

The ECP ALPINE project: In situ and post hoc visualization infrastructure and analysis capabilities for exascale

Exascale Quantum Mechanical Simulations: Navigating the Shifting Sands of Hardware and Software

Towards Exascale for Wind Energy Simulations

ExaWorks Software Development Kit: A Robust and Scalable Collection of Interoperable Workflow Technologies

The Scientific Impact of the Exascale Computing Project

Snowmass Computational Frontier: Topical Group Report on Experimental Algorithm Parallelization

Application software beyond exascale: challenges and possible trends