Abstract:The performance of graphics processing unit (GPU) systems is improving rapidly to accommodate the increasing demands of graphics and high-performance computing applications. With such a performance improvement, however, power consumption of GPU systems is dramatically increased. Up to 30&percnt; of the total power of a GPU system is consumed by the graphic memory itself. Therefore, reducing graphics memory power consumption is critical to mitigate the power challenge. In this article, we propose an energy-efficient reconfigurable 3D die-stacking graphics memory design that integrates wide-interface graphics DRAMs side-by-side with a GPU processor on a silicon interposer. The proposed architecture is a “3D+2.5D” system, where the DRAM memory itself is 3D stacked memory with through-silicon via (TSV), whereas the integration of DRAM and the GPU processor is through the interposer solution (2.5D). Since GPU computing units, memory controllers, and memory are all integrated in the same package, the number of memory I/Os is no longer constrained by the package’s pin count. We can reduce the memory power consumption by scaling down the supply voltage and frequency of memory interface while maintaining the same or even higher peak memory bandwidth. In addition, we design a reconfigurable memory interface that can dynamically adapt to the requirements of various applications. We propose two reconfiguration mechanisms to optimize the GPU system energy efficiency and throughput, respectively, and thus benefit both memory-intensive and compute-intensive applications. The experimental results show that the proposed GPU memory architecture can effectively improve GPU system energy efficiency by 21&percnt;, without reconfiguration. The reconfigurable memory interface can further improve the system energy efficiency by 26&percnt;, and system throughput by 31&percnt; under a capped system power budget of 240W.

GPGPU Memory Estimation and Optimization Targeting OpenCL Architecture

GPU Performance Optimization Targeting OpenCL Model

An Experimental GPU Global Memory Performance Estimation and Optimization

Quantitative GPGPU Performance Model Targeting OpenCL Architecture

Orchestrating Cache Management and Memory Scheduling for GPGPU Applications.

GPU Memory Optimization Through Program Restructuring Methods

A Polyhedral Modeling Based Source-to-Source Code Optimization Framework for GPGPU

Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures

Performance Enhancement of GPU Parallel Computing Using Memory Allocation Optimization

Equidistant Memory Access Coalescing on GPGPU

A Performance Model for General-Purpose Computation on GPU

A Framework for Memory Oversubscription Management in Graphics Processing Units

Workload Analysis for Typical GPU Programs Using CUPTI Interface

Thread Batching for High-performance Energy-efficient GPU Memory Design

ICCAD : U : Optimizing GPU Shared Memory Allocation in Automated Cto-CUDA Compilation

Analyzing Memory Access on CPU-GPGPU Shared LLC Architecture

A Performance Model for GPU Architectures That Considers On-Chip Resources: Application to Medical Image Registration

A perceptual and predictive batch-processing memory scheduling strategy for a CPU-GPU heterogeneous system

Optimizing GPU energy efficiency with 3D die-stacking graphics memory and reconfigurable memory interface

Performance Evaluation and Optimization of HBM-Enabled GPU for Data-Intensive Applications.

Understanding the GPU Microarchitecture to Achieve Bare-Metal Performance Tuning