FASA-DRAM: Reducing DRAM Latency with Destructive Activation and Delayed Restoration

Haitao Du,Yuhan Qin,Song Chen,Yi Kang
DOI: https://doi.org/10.1145/3649135
IF: 1.444
2024-02-23
ACM Transactions on Architecture and Code Optimization
Abstract:DRAM memory is a performance bottleneck for many applications, due to its high access latency. Previous work has mainly focused on data locality, introducing small-but-fast regions to cache frequently accessed data, thereby reducing the average latency. However, these locality-based designs have three challenges in modern multi-core systems: 1) Inter-application interference leads to random memory access traffic. 2) Fairness issues prevent the memory controller from over-prioritizing data locality. 3) Write-intensive applications have much lower locality and evict substantial dirty entries. With frequent data movement between the fast in-DRAM cache and slow regular arrays, the overhead induced by moving data may even offset the performance and energy benefits of in-DRAM caching. In this paper, we decouple the data movement process into two distinct phases. The first phase is Load-Reduced Destructive Activation (LRDA), which destructively promotes data into the in-DRAM cache. The second phase is Delayed Cycle-Stealing Restoration (DCSR), which restores the original data when DRAM bank is idle. LRDA decouples the most time-consuming restoration phase from activation, and DCSR hides the restoration latency through prevalent bank-level parallelism. We propose FASA-DRAM incorporating destructive activation and delayed restoration techniques to enable both in-DRAM caching and proactive latency-hiding mechanisms. Our evaluation shows that FASA-DRAM improves the average performance by 19.9% and reduces average DRAM energy consumption by 18.1% over DDR4 DRAM for four-core workloads, with less than 3.4% extra area overhead. Furthermore, FASA-DRAM outperforms state-of-the-art designs in both performance and energy efficiency.
computer science, theory & methods, hardware & architecture
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the high access latency of dynamic random - access memory (DRAM). Specifically, the existing low - latency DRAM designs mainly rely on data locality to reduce the average latency, but face the following challenges in modern multi - core systems: 1. **Inter - application interference**: It leads to random memory access traffic, making locality strategies difficult to be effective. 2. **Fairness issues**: The memory controller cannot overly prioritize data locality to prevent certain applications or cores from monopolizing memory resources. 3. **Write - intensive applications**: Such applications have low data locality and frequently move data between the fast on - chip DRAM cache and the slower regular arrays, resulting in performance degradation. To address these challenges, the paper proposes a new DRAM architecture - FASA - DRAM (Fast Access DRAM), which reduces access latency and energy consumption by introducing two key technologies: 1. **Load - Reduced Destructive Activation (LRDA)**: It destructively promotes data into the on - chip DRAM cache during the activation process, thereby reducing the impact of the most time - consuming recovery phase on the activation process. 2. **Delayed Cycle - Stealing Recovery (DCSR)**: It recovers the corrupted data when the DRAM is idle and utilizes bank - level parallelism to hide the recovery latency. Through these technologies, FASA - DRAM can significantly reduce access latency while maintaining data integrity and improve the overall performance and energy efficiency of the system. Through detailed circuit simulations and system - level evaluations, the paper proves that FASA - DRAM outperforms existing low - latency DRAM designs in terms of performance and energy efficiency under multi - core workloads.