FASA-DRAM: Reducing DRAM Latency with Destructive Activation and Delayed Restoration

Haitao Du,Yuhan Qin,Song Chen,Yi Kang

DOI: https://doi.org/10.1145/3649135

IF: 1.444

2024-02-23

ACM Transactions on Architecture and Code Optimization

Abstract:DRAM memory is a performance bottleneck for many applications, due to its high access latency. Previous work has mainly focused on data locality, introducing small-but-fast regions to cache frequently accessed data, thereby reducing the average latency. However, these locality-based designs have three challenges in modern multi-core systems: 1) Inter-application interference leads to random memory access traffic. 2) Fairness issues prevent the memory controller from over-prioritizing data locality. 3) Write-intensive applications have much lower locality and evict substantial dirty entries. With frequent data movement between the fast in-DRAM cache and slow regular arrays, the overhead induced by moving data may even offset the performance and energy benefits of in-DRAM caching. In this paper, we decouple the data movement process into two distinct phases. The first phase is Load-Reduced Destructive Activation (LRDA), which destructively promotes data into the in-DRAM cache. The second phase is Delayed Cycle-Stealing Restoration (DCSR), which restores the original data when DRAM bank is idle. LRDA decouples the most time-consuming restoration phase from activation, and DCSR hides the restoration latency through prevalent bank-level parallelism. We propose FASA-DRAM incorporating destructive activation and delayed restoration techniques to enable both in-DRAM caching and proactive latency-hiding mechanisms. Our evaluation shows that FASA-DRAM improves the average performance by 19.9% and reduces average DRAM energy consumption by 18.1% over DDR4 DRAM for four-core workloads, with less than 3.4% extra area overhead. Furthermore, FASA-DRAM outperforms state-of-the-art designs in both performance and energy efficiency.

computer science, theory & methods, hardware & architecture

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the high access latency of dynamic random - access memory (DRAM). Specifically, the existing low - latency DRAM designs mainly rely on data locality to reduce the average latency, but face the following challenges in modern multi - core systems: 1. **Inter - application interference**: It leads to random memory access traffic, making locality strategies difficult to be effective. 2. **Fairness issues**: The memory controller cannot overly prioritize data locality to prevent certain applications or cores from monopolizing memory resources. 3. **Write - intensive applications**: Such applications have low data locality and frequently move data between the fast on - chip DRAM cache and the slower regular arrays, resulting in performance degradation. To address these challenges, the paper proposes a new DRAM architecture - FASA - DRAM (Fast Access DRAM), which reduces access latency and energy consumption by introducing two key technologies: 1. **Load - Reduced Destructive Activation (LRDA)**: It destructively promotes data into the on - chip DRAM cache during the activation process, thereby reducing the impact of the most time - consuming recovery phase on the activation process. 2. **Delayed Cycle - Stealing Recovery (DCSR)**: It recovers the corrupted data when the DRAM is idle and utilizes bank - level parallelism to hide the recovery latency. Through these technologies, FASA - DRAM can significantly reduce access latency while maintaining data integrity and improve the overall performance and energy efficiency of the system. Through detailed circuit simulations and system - level evaluations, the paper proves that FASA - DRAM outperforms existing low - latency DRAM designs in terms of performance and energy efficiency under multi - core workloads.

FASA-DRAM: Reducing DRAM Latency with Destructive Activation and Delayed Restoration

Design Of A Dynamic Memory Access Scheduler

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity

DRAF: A Low-Power DRAM-Based Reconfigurable Acceleration Fabric

Exploring DRAM Cache Prefetching for Pooled Memory

Self-Managing DRAM: A Low-Cost Framework for Enabling Autonomous and Efficient in-DRAM Operations

Reducing Performance Impact of DRAM Refresh by Parallelizing Refreshes with Accesses

TDRAM: Tag-enhanced DRAM for Efficient Caching

Trade-off between Hit Rate and Hit Latency for Optimizing DRAM Cache

GC-ARM: Garbage Collection-Aware RAM Management for Flash Based Solid State Drives

Native DRAM Cache: Re-architecting DRAM as a Large-Scale Cache for Data Centers

Improving DRAM Performance by Parallelizing Refreshes with Accesses

Half-DRAM: A High-Bandwidth and Low-Power DRAM Architecture from the Rethinking of Fine-Grained Activation

Enabling Efficient Dynamic Resizing of Large DRAM Caches via A Hardware Consistent Hashing Mechanism

Flexible-Latency DRAM: Understanding and Exploiting Latency Variation in Modern DRAM Chips

Understanding and Improving the Latency of DRAM-Based Memory Systems

Delay-Hiding energy management mechanisms for DRAM

A Hybrid Main Memory Architecture Design for Reducing DRAM System Refresh Power

A Performance Evaluation of DRAM Access for In-Memory Databases

Read-Tuned STT-RAM and eDRAM Cache Hierarchies for Throughput and Energy Enhancement