Abstract:The International Journal of High Performance Computing Applications, Ahead of Print. Supercomputers have been driving innovations for performance and scaling benefiting several scientific applications for the past few decades. Yet their ecosystems remain virtually unchanged when it comes to integrating distributed data-driven workflows, primarily due to rather rigid access methods and restricted configuration management options. X-as-a-Service model of cloud has introduced, among other features, a developer-centric DevOps approach empowering developers of infrastructure, platform to software artefacts, which, unfortunately contemporary supercomputers still lack. We introduce vClusters (versatile software-defined clusters), which is based on Infrastructure-as-code (IaC) technology. vClusters approach is a unique fusion of HPC and cloud technologies resulting in a software-defined, multi-tenant cluster on a supercomputing ecosystem, that, together with software-defined storage, enable DevOps for complex, data-driven workflows like grid middleware, alongside a classic HPC platform. IaC has been a commonplace in cloud computing, however, it lacked adoption within multi-Petascale ecosystems due to concerns related to performance and interoperability with classic HPC data centres' ecosystems. We present an overview of the Swiss National Supercomputing Centre's flagship Alps ecosystem as an implementation target for vClusters for HPC and data-driven workflows. Alps is based on the Cray-HPE Shasta EX supercomputing platform that includes an IaC compliant, microservices architecture (MSA) management system, which we leverage for demonstrating vClusters usage for our diverse operational workflows. We provide implementation details of two operational vClusters platforms: a classic HPC platform that is used predominantly by hundreds of users running thousands of large-scale numerical simulations batch jobs; and a widely used, data-intensive, Grid computing middleware platform used for CERN Worldwide LHC Computing Grid (WLCG) operations. The resulting solution showcases reuse and reduction of common configuration recipes across vCluster implementations, minimising operational change management overheads while introducing flexibility for managing artefacts for DevOps required by diverse workflows.

Rethinking High Performance Computing Platforms: Challenges, Opportunities and Recommendations

HPC Alongside User-space Kubernetes

Performance on HPC Platforms Is Possible Without C++

Reinventing High Performance Computing: Challenges and Opportunities

On the Convergence of Malleability and the HPC PowerStack: Exploiting Dynamism in Over-Provisioned and Power-Constrained HPC Systems

Towards a Comprehensive Framework for Telemetry Data in HPC Environments

Intelligent colocation of HPC workloads

Computational Performance and Energy Efficiency of ARM based HPC servers

Modernizing the HPC System Software Stack

HPM-Frame: A Decision Framework for Executing Software on Heterogeneous Platforms

Preparing for the Future -- Rethinking Proxy Apps

Versatile software-defined HPC and cloud clusters on Alps supercomputer for diverse workflows

Interactive and Urgent HPC: Challenges and Opportunities

High-performance computing: Transitioning from Instruction-Level Parallelism to heterogeneous hybrid architectures

Online Resource Management in Thermal and Energy Constrained Heterogeneous High Performance Computing

Malleability in Modern HPC Systems: Current Experiences, Challenges, and Future Opportunities

Bridging HPC Communities through the Julia Programming Language

Running Cloud-native Workloads on HPC with High-Performance Kubernetes

Performance and Power Efficient Massive Parallel Computational Model for HPC Heterogeneous Exascale Systems

HPX -- An open source C++ Standard Library for Parallelism and Concurrency

Scalable Systems and Software Architectures for High-Performance Computing on cloud platforms