Abstract:Many emerging cyber-physical systems, such as autonomous vehicles and robots, rely heavily on artificial intelligence and machine learning algorithms to perform important system operations. Since these highly parallel applications are computationally intensive, they need to be accelerated by graphics processing units (GPUs) to meet stringent timing constraints. However, despite the wide adoption of GPUs, efficiently scheduling multiple GPU applications while providing rigorous real-time guarantees remains challenging. Each GPU application has multiple CPU execution and memory copy segments, with GPU kernels running on different hardware resources. Because of the complicated interactions between heterogeneous segments of parallel tasks, high schedulability is hard to achieve with conventional approaches. This paper proposes RTGPU, which combines fine-grain GPU partitioning on the system-side with a novel scheduling algorithm on the theory-side. We start by building a model for CPU and memory copy segments. Leveraging persistent threads, we then implement fine-grained GPU partitioning with improved performance through interleaved execution. To reap the benefits of fine-grained GPU partitioning and schedule multiple parallel GPU applications, we propose a novel real-time scheduling algorithm based on federated scheduling and grid search with uniprocessor fixed-priority scheduling. Our approach provides real-time guarantees to meet hard deadlines and achieves over 11% improvement in system throughput and up to 57% schedulability improvement compared with previous work. We validate and evaluate RTGPU on NVIDIA GPU systems. Our system-side techniques can be applied on mainstream GPUs, and the proposed scheduling theory can be used in general heterogeneous computing platforms which have a similar task execution pattern.

Optimizing the Aggregated Throughput of GPUs in Public Clouds Based on Adaptive Kernel Reordering.

KubeGPU: efficient sharing and isolation mechanisms for GPU resource management in container cloud

GPU Scheduling for Short Tasks in Private Cloud

FIKIT: Priority-Based Real-time GPU Multi-tasking Scheduling with Kernel Identification

RTGPU: Real-Time GPU Scheduling of Hard Deadline Parallel Tasks With Fine-Grain Utilization

A CPU-GPGPU Scheduler Based on Data Transmission Bandwidth of Workload

ArkGPU: enabling applications’ high-goodput co-location execution on multitasking GPUs

Simultaneous Multikernel GPU: Multi-tasking throughput processors via fine-grained sharing

Parallel Transient Stability-Constrained Optimal Power Flow Using GPU as Coprocessor.

Exploiting the Task-Pipelined Parallelism of Stream Programs on Many-Core GPUs.

Optimizing Hardware Resource Partitioning and Job Allocations on Modern GPUs under Power Caps

DxPU: Large Scale Disaggregated GPU Pools in the Datacenter

Orchestrated Co-scheduling, Resource Partitioning, and Power Capping on CPU-GPU Heterogeneous Systems via Machine Learning

Efficient Kernel Management on GPUs.

Exploring the Diversity of Multiple Job Deployments over GPUs for Efficient Resource Sharing

An Empirical-cum-Statistical Approach to Power-Performance Characterization of Concurrent GPU Kernels

Unleashing the Power of Preemptive Priority-based Scheduling for Real-Time GPU Tasks

Orchestrating Cache Management and Memory Scheduling for GPGPU Applications.

Optimizing execution for pipelined‐based distributed deep learning in a heterogeneously networked GPU cluster

Multi-Tenant Virtual GPUs for Optimising Performance of a Financial Risk Application

Efficient GPU Spatial-Temporal Multitasking