Abstract:The purpose of this paper is to study the performance improvement of memory architectures for three-di-mensional chip multi-processors(3D CMPs).As CMPs integrate more and more cores,a great deal of data access pressure is placed on the memory subsystem.Designers face the challenges of feeding enough data to a massive number of on-die cores for CMPs.Three-dimensional integrated circuits(3D ICs)can stack memories of different process technologies into the same chip.The stacking memory bandwidth can be enlarged by using fine-pitch through-silicon vias(TSVs),which can mitigate the pressure on the I/O infrastructure for CMPs.In this paper,we start with studying the potential benefit of 3D integration and the recent advantages on the research of memory ar-chitectures for 3D CMPs.Bothe large caches and main memories can be stacked in 3D CMPs.Hence,we focus on the memory architectures for 3D CMPs in two aspects,stacking cache architecture and stacking main memory architecture.3D CMPs can integrate much larger L2 caches compared to their 2D counterparts in the same area footprint.Meanwhile,the L2 caches can be several layers.We firstly explore the performance improvements of stacking SRAM L2 cache layers atop processor layers for 3D CMPs.The experimental results show that the 3D CMPs with 2 L2 cache layers can improve the performance up to 55% and 34% on average compared to that of 3D CMPs with 1 L2 cache layer.3D CMPs provide opportunities for composing future systems by integrating disparate technologies memories.The off-chip DRAM main memories can be stacked on the processor layers.We secondly study the performance benefit of integrating DRAM main memories into 3D CMPs.The experiment results show that stacking DRAM main memories can provide up to 80% and on average 34.2% performance improvement for 3D CMPs compared to the 2D CMPs with off-chip DRAM main memory.Our analysis and experimental results give a guideline to design efficient 3D CMPs with stacking SRAM L2 caches and DRAM main memories.

Understanding the Memory Behavior of Emerging Multi-core Workloads

Modeling and Benchmarking Computing-in-Memory for Design Space Exploration.

A Study of Leveraging Memory Level Parallelism for DRAM System on Multi-core/Many-Core Architecture

The study of memory architectures for 3D chip multi-processors

Architecting On-Chip Interconnects for Stacked 3D STT-RAM Caches in CMPs

Poster: revisiting virtual channel memory for performance and fairness on multi-core architecture.

Analytical Modeling the Multi-Core Shared Cache Behavior with Considerations of Data-Sharing and Coherence

A Survey of Memory Architecture for 3d Chip Multi-Processors

Energy-efficient Non Uniform Last Level Caches for Chip-multiprocessors Based on Compression

An Architecture-Level Cache Simulation Framework Supporting Advanced PMA STT-MRAM

Adaptive Placement and Migration Policy for an STT-RAM-based Hybrid Cache

L1 Collective Cache: Managing Shared Data for Chip Multiprocessors

Monarch: A Durable Polymorphic Memory For Data Intensive Applications

Native DRAM Cache: Re-architecting DRAM as a Large-Scale Cache for Data Centers

Workload Behavior Driven Memory Subsystem Design for Hyperscale

An Efficient Lightweight Shared Cache Design for Chip Multiprocessors

PASCMP: A Novel Cache Framework for Data Mining Application

Cooperatively Managing Dynamic Writeback and Insertion Policies in a Last-Level DRAM Cache.

Predictable Sharing of Last-level Cache Partitions for Multi-core Safety-critical Systems

Lirac: Using Live Range Information To Optimize Memory Access

A Study on Performance and Power Efficiency of Dense Non-Volatile Caches in Multi-Core Systems