Fine-Grained Scheduling for Containerized HPC Workloads in Kubernetes Clusters

Peini Liu,Jordi Guitart
DOI: https://doi.org/10.48550/arXiv.2211.11487
2022-11-21
Abstract:Containerization technology offers lightweight OS-level virtualization, and enables portability, reproducibility, and flexibility by packing applications with low performance overhead and low effort to maintain and scale them. Moreover, container orchestrators (e.g., Kubernetes) are widely used in the Cloud to manage large clusters running many containerized applications. However, scheduling policies that consider the performance nuances of containerized High Performance Computing (HPC) workloads have not been well-explored yet. This paper conducts fine-grained scheduling policies for containerized HPC workloads in Kubernetes clusters, focusing especially on partitioning each job into a suitable multi-container deployment according to the application profile. We implement our scheduling schemes on different layers of management (application and infrastructure), so that each component has its own focus and algorithms but still collaborates with others. Our results show that our fine-grained scheduling policies outperform baseline and baseline with CPU/memory affinity enabled policies, reducing the overall response time by 35% and 19%, respectively, and also improving the makespan by 34% and 11%, respectively. They also provide better usability and flexibility to specify HPC workloads than other comparable HPC Cloud frameworks, while providing better scheduling efficiency thanks to their multi-layered approach.
Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in Kubernetes clusters, the existing scheduling policies fail to fully consider the performance characteristics of containerized high - performance computing (HPC) workloads. Specifically: 1. **Limitations of Existing Scheduling Policies**: Kubernetes was initially designed to manage loosely - coupled long - running online microservices, and has limited support for HPC applications. The default scheduler is mainly optimized for long - running microservices, and is insufficient for short - lived HPC batch tasks, especially in terms of resource allocation and scheduling efficiency. 2. **Advantages of Multi - Container Deployment Not Fully Utilized**: Although some research shows that by dividing the processes in HPC applications into multiple containers and restricting each container to execute on a single NUMA domain or a specific processor, performance can be improved, these deployment schemes have not been integrated and utilized by current cloud orchestration tools. To solve the above problems, the paper proposes a fine - grained scheduling policy, aiming to optimize the scheduling and deployment of containerized HPC workloads in Kubernetes clusters. Specific objectives include: - **Introducing a Two - Layer Scheduling Architecture**: At the application layer, determine the packaging granularity of HPC workloads according to application characteristics; at the infrastructure layer, implement MPI - aware plugins and task - group scheduling schemes through an enhanced Volcano scheduler. - **Improving Scheduling Efficiency and Performance**: Through reasonable multi - container deployment and resource allocation, reduce the overall response time, improve scheduling efficiency, and provide better availability and flexibility. - **Supporting Multiple HPC Application Scenarios**: Through the fine - grained scheduling policy, adapt to different types of HPC workloads, such as CPU - intensive, memory - intensive, and network - intensive applications, thereby enhancing their performance in the cloud environment. The paper verifies the effectiveness of the proposed fine - grained scheduling policy through experiments, and the results show that this policy is significantly superior to the baseline method in multiple performance indicators.