Abstract:CPU-GPU heterogeneous systems are now commonly used in HPC (High-Performance Computing). However, improving the utilization and energy-efficiency of such systems is still one of the most critical issues. As one single program typically cannot fully utilize all resources within a node/chip, co-scheduling (or co-locating) multiple programs with complementary resource requirements is a promising solution. Meanwhile, as power consumption has become the first-class design constraint for HPC systems, such co-scheduling techniques should be well-tailored for power-constrained environments. To this end, the industry recently started supporting hardware-level resource partitioning features on modern GPUs for realizing efficient co-scheduling, which can operate with existing power capping features. For example, NVidia's MIG (Multi-Instance GPU) partitions one single GPU into multiple instances at the granularity of a GPC (Graphics Processing Cluster). In this paper, we explicitly target the combination of hardware-level GPU partitioning features and power capping for power-constrained HPC systems. We provide a systematic methodology to optimize the combination of chip partitioning, job allocations, as well as power capping based on our scalability/interference modeling while taking a variety of aspects into account, such as compute/memory intensity and utilization in heterogeneous computational resources (e.g., Tensor Cores). The experimental result indicates that our approach is successful in selecting a near optimal combination across multiple different workloads.

What problem does this paper attempt to address?

The paper primarily aims to address the issue of hardware resource partitioning and job allocation optimization on modern GPUs, particularly under power constraints. Specifically, the paper focuses on how to effectively utilize resources in modern heterogeneous CPU-GPU architectures, especially in High-Performance Computing (HPC) environments. Facing challenges such as resource wastage, energy efficiency improvement, and power management, the research proposes a systematic approach to optimize hardware resource partitioning, job allocation, and power limit settings. ### Main Objectives 1. **Improve resource utilization and energy efficiency**: Since a single program usually cannot fully utilize all resources within a node or chip, it is necessary to improve resource utilization and energy efficiency through co-scheduling multiple programs with complementary resource demands. 2. **Adapt to power-constrained environments**: As power consumption becomes the primary design constraint for high-performance computing systems, it is essential to develop scheduling techniques suitable for power-constrained environments. 3. **Leverage modern GPU features**: Modern GPUs, such as the NVIDIA Ampere architecture, support hardware-level resource partitioning features (e.g., Multi-Instance GPU, MIG), making efficient co-scheduling possible on top of existing power limit functionalities. ### Research Contributions - Proposed the first method to simultaneously optimize hardware resource partitioning, job allocation, and power budgeting, targeting real GPU chips. - Conducted multiple preliminary observations, revealing that the scalability of applications is significantly affected by memory sharing options, power limit settings, and application characteristics, including the usage of different computational resources (such as tensor cores) and memory/computation intensity. - Based on these observations, proposed a systematic methodology to optimize resource partitioning, job allocation, and chip-level power budgeting to fit the given problem (or strategy). The methodology includes linear regression performance modeling, model coefficient calibration, and decision-making to select the optimal configuration. Through the above work, the paper addresses the critical issues of resource utilization and energy efficiency improvement in modern high-performance computing environments, while also considering the requirements of power constraints.

Optimizing Hardware Resource Partitioning and Job Allocations on Modern GPUs under Power Caps

Orchestrated Co-scheduling, Resource Partitioning, and Power Capping on CPU-GPU Heterogeneous Systems via Machine Learning

Effective GPU Sharing Under Compiler Guidance

Exploring the Diversity of Multiple Job Deployments over GPUs for Efficient Resource Sharing

Hierarchical Resource Partitioning on Modern GPUs: A Reinforcement Learning Approach

Co-Run Scheduling With Power Cap On Integrated Cpu-Gpu Systems

Contention-Aware GPU Partitioning and Task-to-Partition Allocation for Real-Time Workloads

SchedTune: A Heterogeneity-Aware GPU Scheduler for Deep Learning

Optimizing GPU-Enhanced HPC System and Cloud Procurements for Scientific Workloads.

Efficient Resource Sharing Through GPU Virtualization on Accelerated High Performance Computing Systems

Parallel Transient Stability-Constrained Optimal Power Flow Using GPU as Coprocessor.

CAP: co-scheduling based on asymptotic profiling in CPU+GPU hybrid systems.

Sustainable Supercomputing for AI: GPU Power Capping at HPC Scale

Intra-Cluster Coalescing and Distributed-Block Scheduling to Reduce GPU NoC Pressure.

HeteroCore GPU to Exploit TLP-Resource Diversity

Power- and Fragmentation-aware Online Scheduling for GPU Datacenters

Resource Scheduling Strategy for Performance Optimization Based on Heterogeneous CPU-GPU Platform

Optimal Workload Placement on Multi-Instance GPUs

An Energy Efficient Task Scheduling Scheme for Heterogeneous GPU-enhanced Clusters

GScheduler: Optimizing Resource Provision by Using GPU Usage Pattern Extraction in Cloud Environments

Data-Driven Analysis to Understand GPU Hardware Resource Usage of Optimizations