Abstract:Modern graphics processing units (GPUs) are delivering tremendous computing horsepower by running tens of thousands of threads concurrently. The massively parallel execution model has been effective to hide the long latency of off-chip memory accesses in graphics and other general computing applications exhibiting regular memory behaviors. With the fast-growing demand for general purpose computing on GPUs (GPGPU), GPU workloads are becoming highly diversified, and thus requiring a synergistic coordination of both computing and memory resources to unleash the computing power of GPUs. Accordingly, recent graphics processors begin to integrate an on-die level-2 (L2) cache. The huge number of threads on GPUs, however, poses significant challenges to L2 cache design. The experiments on a variety of GPGPU applications reveal that the L2 cache may or may not improve the overall performance depending on the characteristics of applications. In this paper, we propose efficient techniques to improve GPGPU performance by orchestrating both L2 cache and memory in a unified framework. The basic philosophy is to exploit the temporal locality among the massive number of concurrent memory requests and minimize the impact of memory divergence behaviors among simultaneously executed groups of threads. Our major contributions are twofold. First, a priority-based cache management is proposed to maximize the chance of frequently revisited data to be kept in the cache. Second, an effective memory scheduling is introduced to reorder memory requests in the memory controller according to the divergence behavior for reducing average waiting time of warps. Simulation results reveal that our techniques enhance the overall performance by 10% on average for memory intensive benchmarks, whereas the maximum gain can be up to 30%.

Maximizing the Utilization of GPUs Used by Cloud Gaming Through Adaptive Co-location with Combo

CGSharing: Efficient content sharing in GPU-based cloud gaming

FLARE: Flexibly Sharing Commodity GPUs to Enforce QoS and Improve Utilization

GPU consolidation for cloud games: Are we there yet?

Placing Virtual Machines to Optimize Cloud Gaming Experience

A Virtual Multi-Channel GPU Fair Scheduling Method for Virtual Machines.

CoCG: Fine-grained Cloud Game Co-location on Heterogeneous Platform

A Cloud Gaming System Based on User-Level Virtualization and Its Resource Scheduling

Cost-Efficient and Quality of Experience-aware Player Request Scheduling and Rendering Server Allocation for Edge Computing Assisted Multiplayer Cloud Gaming

Simultaneous Multikernel GPU: Multi-tasking throughput processors via fine-grained sharing

Toward QoS-Awareness and Improved Utilization of Spatial Multitasking GPUs

Simultaneous Multikernel: Fine-Grained Sharing of GPUs.

Vgasa: Adaptive Scheduling Algorithm of Virtualized GPU Resource in Cloud Gaming

Gqos: A QoS-Oriented GPU Virtualization with Adaptive Capacity Sharing

Gremote: Cloud Rendering on GPU Resource Pool Based on API-forwarding

A Cloud-Edge Collaborative Gaming Framework Using AI-Powered Foveated Rendering and Super Resolution

An Economy-Oriented GPU Virtualization with Dynamic and Adaptive Oversubscription.

Improving Cloud Gaming Experience through Mobile Edge Computing

A user mode CPU–GPU scheduling framework for hybrid workloads

T-Gaming: A Cost-Efficient Cloud Gaming System at Scale

Orchestrating Cache Management and Memory Scheduling for GPGPU Applications.