Abstract:The International Journal of High Performance Computing Applications, Ahead of Print. Supercomputers have been driving innovations for performance and scaling benefiting several scientific applications for the past few decades. Yet their ecosystems remain virtually unchanged when it comes to integrating distributed data-driven workflows, primarily due to rather rigid access methods and restricted configuration management options. X-as-a-Service model of cloud has introduced, among other features, a developer-centric DevOps approach empowering developers of infrastructure, platform to software artefacts, which, unfortunately contemporary supercomputers still lack. We introduce vClusters (versatile software-defined clusters), which is based on Infrastructure-as-code (IaC) technology. vClusters approach is a unique fusion of HPC and cloud technologies resulting in a software-defined, multi-tenant cluster on a supercomputing ecosystem, that, together with software-defined storage, enable DevOps for complex, data-driven workflows like grid middleware, alongside a classic HPC platform. IaC has been a commonplace in cloud computing, however, it lacked adoption within multi-Petascale ecosystems due to concerns related to performance and interoperability with classic HPC data centres' ecosystems. We present an overview of the Swiss National Supercomputing Centre's flagship Alps ecosystem as an implementation target for vClusters for HPC and data-driven workflows. Alps is based on the Cray-HPE Shasta EX supercomputing platform that includes an IaC compliant, microservices architecture (MSA) management system, which we leverage for demonstrating vClusters usage for our diverse operational workflows. We provide implementation details of two operational vClusters platforms: a classic HPC platform that is used predominantly by hundreds of users running thousands of large-scale numerical simulations batch jobs; and a widely used, data-intensive, Grid computing middleware platform used for CERN Worldwide LHC Computing Grid (WLCG) operations. The resulting solution showcases reuse and reduction of common configuration recipes across vCluster implementations, minimising operational change management overheads while introducing flexibility for managing artefacts for DevOps required by diverse workflows.

Running Alchemist on Cray XC and CS Series Supercomputers: Dask and PySpark Interfaces, Deployment Options, and Data Transfer Times

A Benchmarking Study to Evaluate Apache Spark on Large-Scale Supercomputers

Optimizing the Cray Graph Engine for performant analytics on cluster, SuperDome Flex, Shasta systems and cloud deployment

Alchemy: Distributed Financial Quantitative Analysis System with High‐level Programming Model

Benchmarking Harp-DAAL: High Performance Hadoop on KNL Clusters

Versatile software-defined HPC and cloud clusters on Alps supercomputer for diverse workflows

Early experiences evaluating the HPE/Cray ecosystem for AMD GPUs

Evaluating integration and performance of containerized climate applications on a Hewlett Packard Enterprise Cray system

Introduction to Harp: when Big Data Meets HPC

Supercharging Distributed Computing Environments For High Performance Data Engineering

Asynchronous Complex Analytics in a Distributed Dataflow Architecture

Deploying AI Frameworks on Secure HPC Systems with Containers

Performance comparison of Dask and Apache Spark on HPC systems for Neuroimaging

Employing artificial intelligence to steer exascale workflows with colmena

Exploratory Data Science on Supercomputers for Quantum Mechanical Calculations

Supercharging distributed computing environments for high-performance data engineering

Framing Apache Spark in life sciences

Effective use of the PGAS Paradigm: Driving Transformations and Self-Adaptive Behavior in DASH-Applications

Porting of the DBCSR library for Sparse Matrix-Matrix Multiplications to Intel Xeon Phi systems

AIR: A Light-Weight Yet High-Performance Dataflow Engine based on Asynchronous Iterative Routing

Performance on HPC Platforms Is Possible Without C++