Abstract:The rapidly growing popularity and scale of data-parallel workloads demand a corresponding increase in raw computational power of GPUs (Graphics Processing Units). As single-GPU systems struggle to satisfy the performance demands, multi-GPU systems have begun to dominate the high-performance computing world. The advent of such systems raises a number of design challenges, including the GPU microarchitecture, multi-GPU interconnect fabrics, runtime libraries and associated programming models. The research community currently lacks a publically available and comprehensive multi-GPU simulation framework and benchmark suite to evaluate multi-GPU system design solutions. In this work, we present MGSim, a cycle-accurate, extensively validated, multi-GPU simulator, based on AMD's Graphics Core Next 3 (GCN3) instruction set architecture. We complement MGSim with MGMark, a suite of multi-GPU workloads that explores multi-GPU collaborative execution patterns. Our simulator is scalable and comes with in-built support for multi-threaded execution to enable fast and efficient simulations. In terms of performance accuracy, MGSim differs $5.5\%$ on average when compared against actual GPU hardware. We also achieve a $3.5\times$ and a $2.5\times$ average speedup in function emulation and architectural simulation with 4 CPU cores, while delivering the same accuracy as the serial simulation. We illustrate the novel simulation capabilities provided by our simulator through a case study exploring programming models based on a unified multi-GPU system (U-MGPU) and a discrete multi-GPU system (D-MGPU) that both utilize unified memory space and cross-GPU memory access. We evaluate the design implications from our case study, suggesting that D-MGPU is an attractive programming model for future multi-GPU systems.

Simultaneous Multikernel GPU: Multi-tasking throughput processors via fine-grained sharing

Efficient GPU Spatial-Temporal Multitasking

A Virtual Multi-Channel GPU Fair Scheduling Method for Virtual Machines.

Improving GPU Performance Through Resource Sharing

Exploiting the Task-Pipelined Parallelism of Stream Programs on Many-Core GPUs.

Efficient Resource Sharing Through GPU Virtualization on Accelerated High Performance Computing Systems

MGPU-TSM: A Multi-GPU System with Truly Shared Memory

Efficient Kernel Management on GPUs.

Improving Multi-Application Concurrency Support Within the GPU Memory System

FIKIT: Priority-Based Real-time GPU Multi-tasking Scheduling with Kernel Identification

Process variation-aware workload partitioning algorithms for GPUs supporting spatial-multitasking

Exploiting Parallelism in the Simulation of General Purpose Graphics Processing Unit Program

Exploring the Diversity of Multiple Job Deployments over GPUs for Efficient Resource Sharing

PILOT: a Runtime System to Manage Multi-tenant GPU Unified Memory Footprint

Concurrent analytical query processing with GPUs

Orchestrated Co-scheduling, Resource Partitioning, and Power Capping on CPU-GPU Heterogeneous Systems via Machine Learning

Cross-Core Data Sharing for Energy-Efficient GPUs

MGSim + MGMark: A Framework for Multi-GPU System Research

Enhanced GPU Resource Utilization through Fairness-aware Task Scheduling

Kernel concurrency opportunities based on GPU benchmarks characterization

FLARE: Flexibly Sharing Commodity GPUs to Enforce QoS and Improve Utilization