ECP libraries and tools: An overview

Michael A Heroux,Lois Curfman McInnes,James Ahrens,Todd Gamblin,Timothy C Germann,Xiaoye Sherry Li,Kathryn Mohror,Todd Munson,Sameer Shende,Rajeev Thakur,Jeffrey Vetter,James Willenbring
DOI: https://doi.org/10.1177/10943420241271005
2024-09-15
The International Journal of High Performance Computing Applications
Abstract:The International Journal of High Performance Computing Applications, Ahead of Print. The Exascale Computing Project (ECP) Software Technology and Co-Design teams addressed the growing complexities in high-performance computing (HPC) by developing scalable software libraries and tools that leverage exascale system capabilities. As we enter the exascale era, the need for reusable, optimized software solutions that can handle the unique challenges posed by these systems becomes increasingly important. The primary challenges the ECP teams faced were to create software libraries and tools that are performant on exascale architectures and portable and usable across diverse hardware platforms. Efforts addressed issues related to concurrent execution, memory management, and the integration of heterogeneous computing resources, such as GPUs from multiple vendors. The ECP's strategy involved a structured development process encompassing the creation, optimization, and deployment of software in collaboration with industry, academia, and national laboratories. The project was organized into several technical areas: co-design of domain-specific suites with target applications, programming models and runtimes, development tools, mathematical libraries, data and visualization tools, and software ecosystem and delivery mechanisms. ECP has successfully developed a large portfolio of software libraries and tools that demonstrate significant improvements in performance and scalability on exascale systems. These products have been integrated into the Department of Energy's computing facilities, supporting various scientific applications and ensuring robust performance across different hardware setups. ECP advancements in software development for exascale computing highlight the importance of a collaborative and adaptive approach to handling next-generation HPC systems complexities. The lessons learned emphasize the need for continuous engagement with end-users and vendors, and the importance of maintaining a balance between innovation and practical implementation. Future efforts will focus on ensuring scalability, keeping pace with rapid hardware advancements, and further enhancing the interoperability and usability of the software ecosystem. Subsequent articles in this special issue provide in-depth discussions and case studies into specific library and tool efforts.
computer science, theory & methods, interdisciplinary applications, hardware & architecture
What problem does this paper attempt to address?