Optimizing Hardware Resource Partitioning and Job Allocations on Modern GPUs under Power Caps

Eishi Arima,Minjoon Kang,Issa Saba,Josef Weidendorfer,Carsten Trinitis,Martin Schulz
DOI: https://doi.org/10.1145/3547276.3548630
2024-05-07
Abstract:CPU-GPU heterogeneous systems are now commonly used in HPC (High-Performance Computing). However, improving the utilization and energy-efficiency of such systems is still one of the most critical issues. As one single program typically cannot fully utilize all resources within a node/chip, co-scheduling (or co-locating) multiple programs with complementary resource requirements is a promising solution. Meanwhile, as power consumption has become the first-class design constraint for HPC systems, such co-scheduling techniques should be well-tailored for power-constrained environments. To this end, the industry recently started supporting hardware-level resource partitioning features on modern GPUs for realizing efficient co-scheduling, which can operate with existing power capping features. For example, NVidia's MIG (Multi-Instance GPU) partitions one single GPU into multiple instances at the granularity of a GPC (Graphics Processing Cluster). In this paper, we explicitly target the combination of hardware-level GPU partitioning features and power capping for power-constrained HPC systems. We provide a systematic methodology to optimize the combination of chip partitioning, job allocations, as well as power capping based on our scalability/interference modeling while taking a variety of aspects into account, such as compute/memory intensity and utilization in heterogeneous computational resources (e.g., Tensor Cores). The experimental result indicates that our approach is successful in selecting a near optimal combination across multiple different workloads.
Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The paper primarily aims to address the issue of hardware resource partitioning and job allocation optimization on modern GPUs, particularly under power constraints. Specifically, the paper focuses on how to effectively utilize resources in modern heterogeneous CPU-GPU architectures, especially in High-Performance Computing (HPC) environments. Facing challenges such as resource wastage, energy efficiency improvement, and power management, the research proposes a systematic approach to optimize hardware resource partitioning, job allocation, and power limit settings. ### Main Objectives 1. **Improve resource utilization and energy efficiency**: Since a single program usually cannot fully utilize all resources within a node or chip, it is necessary to improve resource utilization and energy efficiency through co-scheduling multiple programs with complementary resource demands. 2. **Adapt to power-constrained environments**: As power consumption becomes the primary design constraint for high-performance computing systems, it is essential to develop scheduling techniques suitable for power-constrained environments. 3. **Leverage modern GPU features**: Modern GPUs, such as the NVIDIA Ampere architecture, support hardware-level resource partitioning features (e.g., Multi-Instance GPU, MIG), making efficient co-scheduling possible on top of existing power limit functionalities. ### Research Contributions - Proposed the first method to simultaneously optimize hardware resource partitioning, job allocation, and power budgeting, targeting real GPU chips. - Conducted multiple preliminary observations, revealing that the scalability of applications is significantly affected by memory sharing options, power limit settings, and application characteristics, including the usage of different computational resources (such as tensor cores) and memory/computation intensity. - Based on these observations, proposed a systematic methodology to optimize resource partitioning, job allocation, and chip-level power budgeting to fit the given problem (or strategy). The methodology includes linear regression performance modeling, model coefficient calibration, and decision-making to select the optimal configuration. Through the above work, the paper addresses the critical issues of resource utilization and energy efficiency improvement in modern high-performance computing environments, while also considering the requirements of power constraints.