FILL: a heterogeneous resource scheduling system addressing the low throughput problem in GROMACS

Yueyuan Zhou,ZiYi Ren,En Shao,Lixian Ma,Qiang Hu,Leping Wang,Guangming Tan
DOI: https://doi.org/10.1007/s42514-023-00169-5
2023-09-23
CCF Transactions on High Performance Computing
Abstract:Despite advancements in computer hardware, the performance of GROMACS simulations has not exhibited significant improvement, primarily due to the inefficient utilization of substantial hardware resources. Enhancing resource utilization in GROMACS simulations can be achieved through effective resource scheduling when running multiple simulations concurrently on a single computing node, particularly benefiting small-scale system simulations which are frequently employed. Previous research focused on co-running multiple GROMACS simulations through the utilization of time-slice technology. However, this approach introduced notable context-switching overhead and predominantly concentrated on optimizing GPU resources utilization, while neglecting the collaborative scheduling of heterogeneous CPU and GPU devices. Nowadays, various GPU vendors have introduced hardware partitioning technologies for spatial resources allocation, complementing traditional time-sharing techniques. Moreover, GROMACS operates as a heterogeneous computing application, alternating computations between the CPU and GPU devices. Notably, GPU utilization sometimes accounts for as little as 35%. Consequently, a comprehensive approach involving coordinated scheduling between both the GPU and CPU is imperative. To leverage the potential of hardware partitioning technologies in alignment with GROMACS’ runtime characteristics, we propose FILL: a resource scheduling system designed for co-running multiple GROMACS jobs. FILL employs space partitioning technology to effectively allocate hardware resources and facilitates collaborative scheduling of CPU and GPU resources, thereby ensuring precise and deterministic allocation of GROMACS job resources. The scheduling aims to improve system throughput while considering the turnaround time of simulations. Implemented on servers equipped with NVIDIA and AMD GPUs, FILL has showcased noteworthy advancements in system throughput. On NVIDIA GPU servers, FILL achieved an impressive improvement of up to 167% compared to the baseline approach and an astonishing boost of 27,928% compared to state-of-the-art alternatives. Similarly, on AMD GPU servers, FILL demonstrated significant enhancements of 459% and 24% over the baseline and state-of-the-art methods, respectively. These remarkable results validate the effectiveness of FILL in optimizing system throughput for multiple GROMACS simulations.
What problem does this paper attempt to address?