Abstract:Containerization technology offers lightweight OS-level virtualization, and enables portability, reproducibility, and flexibility by packing applications with low performance overhead and low effort to maintain and scale them. Moreover, container orchestrators (e.g., Kubernetes) are widely used in the Cloud to manage large clusters running many containerized applications. However, scheduling policies that consider the performance nuances of containerized High Performance Computing (HPC) workloads have not been well-explored yet. This paper conducts fine-grained scheduling policies for containerized HPC workloads in Kubernetes clusters, focusing especially on partitioning each job into a suitable multi-container deployment according to the application profile. We implement our scheduling schemes on different layers of management (application and infrastructure), so that each component has its own focus and algorithms but still collaborates with others. Our results show that our fine-grained scheduling policies outperform baseline and baseline with CPU/memory affinity enabled policies, reducing the overall response time by 35% and 19%, respectively, and also improving the makespan by 34% and 11%, respectively. They also provide better usability and flexibility to specify HPC workloads than other comparable HPC Cloud frameworks, while providing better scheduling efficiency thanks to their multi-layered approach.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: in Kubernetes clusters, the existing scheduling policies fail to fully consider the performance characteristics of containerized high - performance computing (HPC) workloads. Specifically: 1. **Limitations of Existing Scheduling Policies**: Kubernetes was initially designed to manage loosely - coupled long - running online microservices, and has limited support for HPC applications. The default scheduler is mainly optimized for long - running microservices, and is insufficient for short - lived HPC batch tasks, especially in terms of resource allocation and scheduling efficiency. 2. **Advantages of Multi - Container Deployment Not Fully Utilized**: Although some research shows that by dividing the processes in HPC applications into multiple containers and restricting each container to execute on a single NUMA domain or a specific processor, performance can be improved, these deployment schemes have not been integrated and utilized by current cloud orchestration tools. To solve the above problems, the paper proposes a fine - grained scheduling policy, aiming to optimize the scheduling and deployment of containerized HPC workloads in Kubernetes clusters. Specific objectives include: - **Introducing a Two - Layer Scheduling Architecture**: At the application layer, determine the packaging granularity of HPC workloads according to application characteristics; at the infrastructure layer, implement MPI - aware plugins and task - group scheduling schemes through an enhanced Volcano scheduler. - **Improving Scheduling Efficiency and Performance**: Through reasonable multi - container deployment and resource allocation, reduce the overall response time, improve scheduling efficiency, and provide better availability and flexibility. - **Supporting Multiple HPC Application Scenarios**: Through the fine - grained scheduling policy, adapt to different types of HPC workloads, such as CPU - intensive, memory - intensive, and network - intensive applications, thereby enhancing their performance in the cloud environment. The paper verifies the effectiveness of the proposed fine - grained scheduling policy through experiments, and the results show that this policy is significantly superior to the baseline method in multiple performance indicators.

Fine-Grained Scheduling for Containerized HPC Workloads in Kubernetes Clusters

Dynamically Adjusting Scale of a Kubernetes Cluster under QoS Guarantee

Preemptive and Low Latency Datacenter Scheduling via Lightweight Containers

Implementation of GPU Scheduling Method for Kubernetes

A priority-aware scheduling framework for heterogeneous workloads in container-based cloud

SchedTune: A Heterogeneity-Aware GPU Scheduler for Deep Learning

Towards Standard Kubernetes Scheduling Interfaces for Converged Computing

Kub: Enabling Elastic HPC Workloads on Containerized Environments

ECSched: Efficient Container Scheduling on Heterogeneous Clusters.

Concurrent container scheduling on heterogeneous clusters with multi-resource constraints

Container orchestration on HPC systems through Kubernetes

Fine-grained Scheduling in Multi-Resource Clusters

Containers Orchestration with Cost-Efficient Autoscaling in Cloud Computing Environments

A survey of Kubernetes scheduling algorithms

Fine-grained multi-resource scheduling in cloud datacenters

A HPC Co-Scheduler with Reinforcement Learning

Speculative Container Scheduling for Deep Learning Applications in a Kubernetes Cluster

A Periodic Portfolio Scheduler for Scientific Computing in the Data Center.

Dynamic performance-Energy tradeoff consolidation with contention-aware resource provisioning in containerized clouds

Hybrid Workload Scheduling on HPC Systems