Abstract:Due to the cost-effective, massive computational power of graphics processing units (GPUs), there is a growing interest of utilizing GPUs in real-time systems. For example GPUs have been applied to automotive systems to enable new advanced and intelligent driver assistance technologies, accelerating the path to self-driving cars. In such systems, GPUs are shared among tasks with mixed timing constraints: real-time (RT) tasks that have to be accomplished before specified deadlines, and non-real-time, best-effort (BE) tasks. In this paper, (1) we propose resource-aware non-uniform slack distribution to enhance the schedulability of RT tasks (the total amount of work of RT tasks whose deadlines can be satisfied on a given amount of resources) in GPU-enabled systems; (2) we propose deadline-aware dynamic GPU partitioning to allow RT and BE tasks to run on a GPU simultaneously, such that BE tasks are not blocked for a long time. We evaluate the effectiveness of the proposed approaches by using both synthetic benchmarks and a real-world workload that consists of a set of emerging automotive tasks. Experimental results show that the proposed approaches yield significant schedulability improvement for RT tasks and turnaround time decrement for BE tasks. Moreover, the analysis of two driving scenarios shows that such schedulability improvement and turnaround time decrement can significantly enhance the driving safety and experience. For example, when the resource-aware non-uniform slack distribution approach is used, the distance that a car travels during the time between a traffic sign (pedestrian) is "seen and recognized" is decreased from 44.4m to 22.2m (from 4.4m to 2.2m); when the deadline-aware dynamic GPU partitioning approach is used, the distance that the car has traveled before a drowsy driver is woken up is reduced from 56.2m to 29.2m.

Priority-Based PCIe Scheduling for Multi-Tenant Multi-GPU Systems.

A Virtual Multi-Channel GPU Fair Scheduling Method for Virtual Machines.

Preemption-Aware Kernel Scheduling for GPUs

A CPU-GPGPU Scheduler Based on Data Transmission Bandwidth of Workload

Priority-Aware Near-Optimal Scheduling for Heterogeneous Multi-Core Systems with Specialized Accelerators

Effective GPU Sharing Under Compiler Guidance

Unleashing the Power of Preemptive Priority-based Scheduling for Real-Time GPU Tasks

Orchestrated Co-scheduling, Resource Partitioning, and Power Capping on CPU-GPU Heterogeneous Systems via Machine Learning

Intra-Cluster Coalescing and Distributed-Block Scheduling to Reduce GPU NoC Pressure.

Scheduling Tasks with Mixed Timing Constraints in GPU-Powered Real-Time Systems.

Optimizing the Aggregated Throughput of GPUs in Public Clouds Based on Adaptive Kernel Reordering.

Efficient Sharing and Fine-Grained Scheduling of Virtualized GPU Resources

Implementation of GPU Scheduling Method for Kubernetes

A perceptual and predictive batch-processing memory scheduling strategy for a CPU-GPU heterogeneous system

Quality of Service Support for Fine-Grained Sharing on GPUs.

RTGPU: Real-Time GPU Scheduling of Hard Deadline Parallel Tasks With Fine-Grain Utilization

PILOT: a Runtime System to Manage Multi-tenant GPU Unified Memory Footprint

Gqos: A QoS-Oriented GPU Virtualization with Adaptive Capacity Sharing

PPS: Fair and efficient black-box scheduling for multi-tenant GPU clusters

Toward QoS-Awareness and Improved Utilization of Spatial Multitasking GPUs

Towards QoS-Aware and Resource-Efficient GPU Microservices Based on Spatial Multitasking GPUs In Datacenters