Design and realization of hybrid resource management system for heterogeneous cluster
Qinlu He,Fan Zhang,Genqing Bian,Weiqi Zhang,Zhen Li
DOI: https://doi.org/10.1007/s10586-024-04267-z
2024-02-23
Cluster Computing
Abstract:As user-generated data diversifies, an increasing number of big data tasks now encompass unstructured data forms, including audio, imagery, and video content. The processing of these data often requires the support of GPU devices in the cluster. However, most of the existing cluster resource management frameworks lack effective decoupling of CPU resources and GPU resources, and the management granularity of GPU resources is too coarse, resulting in poor GPU sharing and low resource utilization. Given, for the inadequacy of the existing cluster resource management framework that cannot support batch stream integration and CPU-GPU resource scheduling, we design and implement a Hybrid Heterogeneous Resource Management (H-HRM) for CPU-GPU heterogeneous clusters. The resource queue binding mechanism on computing nodes provides flexible binding of CPU resources to GPU resources, which solves the problem that CPU resources and GPU resources in the cluster are difficult to decouple. According to the different uses of CPU or GPU as the main resource, a Hybrid Domain Resource Fairness (HDRF) model is proposed to realize the reasonable allocation of CPU resources and GPU resources. Through the queue stacking technology and the mechanism that multiple executors can run simultaneously on the queue, fine-grained sharing of GPU resources is realized and the utilization rate of GPU resources is improved. Finally, this paper realizes the docking of H-HRM and Spark programming framework and conducts real load tests, and after comparing with the performance of Mesos, it is proved that H-HRM can handle the mixed scenarios of batch jobs and stream processing jobs in the cluster, and the HDRF algorithm also greatly improves the utilization of the GPU.
computer science, information systems, theory & methods