Abstract:Despite advancements in computer hardware, the performance of GROMACS simulations has not exhibited significant improvement, primarily due to the inefficient utilization of substantial hardware resources. Enhancing resource utilization in GROMACS simulations can be achieved through effective resource scheduling when running multiple simulations concurrently on a single computing node, particularly benefiting small-scale system simulations which are frequently employed. Previous research focused on co-running multiple GROMACS simulations through the utilization of time-slice technology. However, this approach introduced notable context-switching overhead and predominantly concentrated on optimizing GPU resources utilization, while neglecting the collaborative scheduling of heterogeneous CPU and GPU devices. Nowadays, various GPU vendors have introduced hardware partitioning technologies for spatial resources allocation, complementing traditional time-sharing techniques. Moreover, GROMACS operates as a heterogeneous computing application, alternating computations between the CPU and GPU devices. Notably, GPU utilization sometimes accounts for as little as 35%. Consequently, a comprehensive approach involving coordinated scheduling between both the GPU and CPU is imperative. To leverage the potential of hardware partitioning technologies in alignment with GROMACS’ runtime characteristics, we propose FILL: a resource scheduling system designed for co-running multiple GROMACS jobs. FILL employs space partitioning technology to effectively allocate hardware resources and facilitates collaborative scheduling of CPU and GPU resources, thereby ensuring precise and deterministic allocation of GROMACS job resources. The scheduling aims to improve system throughput while considering the turnaround time of simulations. Implemented on servers equipped with NVIDIA and AMD GPUs, FILL has showcased noteworthy advancements in system throughput. On NVIDIA GPU servers, FILL achieved an impressive improvement of up to 167% compared to the baseline approach and an astonishing boost of 27,928% compared to state-of-the-art alternatives. Similarly, on AMD GPU servers, FILL demonstrated significant enhancements of 459% and 24% over the baseline and state-of-the-art methods, respectively. These remarkable results validate the effectiveness of FILL in optimizing system throughput for multiple GROMACS simulations.

Efficient Sharing and Fine-Grained Scheduling of Virtualized GPU Resources

A Virtual Multi-Channel GPU Fair Scheduling Method for Virtual Machines.

Effectively Mitigating I/O Inactivity In Vcpu Scheduling

Efficient Resource Sharing Through GPU Virtualization on Accelerated High Performance Computing Systems

Enabling Efficient Spatio-Temporal GPU Sharing for Network Function Virtualization

RTGPU: Real-Time GPU Scheduling of Hard Deadline Parallel Tasks With Fine-Grain Utilization

Optimizing GPU Virtualization with Address Mapping and Delayed Submission

FIKIT: Priority-Based Real-time GPU Multi-tasking Scheduling with Kernel Identification

FLARE: Flexibly Sharing Commodity GPUs to Enforce QoS and Improve Utilization

Efficient GPU Spatial-Temporal Multitasking

Energy Efficient Real-Time Task Scheduling on CPU-GPU Hybrid Clusters

Efficient Consolidation-Aware VCPU Scheduling on Multicore Virtualization Platform.

Improving GPU Performance Through Resource Sharing

GPU Scheduling for Short Tasks in Private Cloud

Exploring the Diversity of Multiple Job Deployments over GPUs for Efficient Resource Sharing

Implementation of GPU virtualization using PCI pass-through mechanism

Orchestrated Co-scheduling, Resource Partitioning, and Power Capping on CPU-GPU Heterogeneous Systems via Machine Learning

Simultaneous Multikernel GPU: Multi-tasking throughput processors via fine-grained sharing

G-NET: Effective GPU Sharing in NFV Systems.

FILL: a heterogeneous resource scheduling system addressing the low throughput problem in GROMACS

A user mode CPU–GPU scheduling framework for hybrid workloads