Abstract:Due to the cost-effective, massive computational power of graphics processing units (GPUs), there is a growing interest of utilizing GPUs in real-time systems. For example GPUs have been applied to automotive systems to enable new advanced and intelligent driver assistance technologies, accelerating the path to self-driving cars. In such systems, GPUs are shared among tasks with mixed timing constraints: real-time (RT) tasks that have to be accomplished before specified deadlines, and non-real-time, best-effort (BE) tasks. In this paper, (1) we propose resource-aware non-uniform slack distribution to enhance the schedulability of RT tasks (the total amount of work of RT tasks whose deadlines can be satisfied on a given amount of resources) in GPU-enabled systems; (2) we propose deadline-aware dynamic GPU partitioning to allow RT and BE tasks to run on a GPU simultaneously, such that BE tasks are not blocked for a long time. We evaluate the effectiveness of the proposed approaches by using both synthetic benchmarks and a real-world workload that consists of a set of emerging automotive tasks. Experimental results show that the proposed approaches yield significant schedulability improvement for RT tasks and turnaround time decrement for BE tasks. Moreover, the analysis of two driving scenarios shows that such schedulability improvement and turnaround time decrement can significantly enhance the driving safety and experience. For example, when the resource-aware non-uniform slack distribution approach is used, the distance that a car travels during the time between a traffic sign (pedestrian) is "seen and recognized" is decreased from 44.4m to 22.2m (from 4.4m to 2.2m); when the deadline-aware dynamic GPU partitioning approach is used, the distance that the car has traveled before a drowsy driver is woken up is reduced from 56.2m to 29.2m.

QoS-aware Dynamic Resource Allocation with Improved Utilization and Energy Efficiency on GPU

Gqos: A QoS-Oriented GPU Virtualization with Adaptive Capacity Sharing

Power Aware Job Scheduling with Quality of Service Guarantees : A Preliminary Study

QoS-Aware Scheduling of Remote Rendering for Interactive Multimedia Applications in Edge Computing

Quality of Service Support for Fine-Grained Sharing on GPUs.

Toward QoS-Awareness and Improved Utilization of Spatial Multitasking GPUs

A user mode CPU–GPU scheduling framework for hybrid workloads

Towards QoS-Aware and Resource-Efficient GPU Microservices Based on Spatial Multitasking GPUs In Datacenters

Design and realization of hybrid resource management system for heterogeneous cluster

GPU Energy optimization based on task balance scheduling

Scheduling Tasks with Mixed Timing Constraints in GPU-Powered Real-Time Systems.

SMGuard: A Flexible and Fine-Grained Resource Management Framework for GPUs

A Virtual Multi-Channel GPU Fair Scheduling Method for Virtual Machines.

Cost-Constrained QoS Optimization for Approximate Computation Real-Time Tasks in Heterogeneous MPSoCs

Energy-Aware Task Scheduling Strategies With Qos Constraint For Green Computing In Cloud Data Centers

An Energy-Efficient Task Scheduling Method for CPU-GPU Heterogeneous Cloud.

DSO: A GPU Energy Efficiency Optimizer by Fusing Dynamic and Static Information

GScheduler: Optimizing Resource Provision by Using GPU Usage Pattern Extraction in Cloud Environments

Effective GPU Sharing Under Compiler Guidance

Towards Minimizing Resource Usage with QoS Guarantee in Cloud Gaming

Run-Time Performance Estimation and Fairness-Oriented Scheduling Policy for Concurrent GPGPU Applications