Abstract:With the emergence of cutting-edge hardware systems such as cloud computing, edge computing, and on-chip neural network accelerators, how to design advanced memory strategies to substitute the traditional ones for maximizing the potential performance of non-volatile memory (NVM) under the existing hardware conditions, has become an urgent research issue for both academia and industrial communities. It is promising and innovative to improve computer systems in the layer of data exchanging with the emerging advanced semiconductor devices. In the paper, to address the inefficiencies of write-intensive, high power consumption, low hit rate and so on, which exist in hybrid Magnetic Random Access Memory (MRAM) cache systems, three novel cache replacement strategies and two cache prefetching strategies are put forward. The proposed triple novel replacement strategies, including historical frequency and time judgments, duplicate data-aware deletion, and dynamic relevance factors computing, can be utilized to compensate for the shortcomings of the traditional Least Recently Used (LRU) replacement strategy, respectively. In the two novel prefetching strategies, region distribution parameters and Listnet ranking network are imported into the caching process, respectively, to achieve optimized hitting performance. The simulation results demonstrate that the proposed replacement strategies can achieve up to 61.76%, 84.91%, 56.49%, and 53.21% optimization of write count, hit rate, dynamic power, and IPC compared to the conventional one. The proposed prefetching strategy can achieve up to 91.27%, 49.25% hit rate and IPC optimization. Meanwhile, the synthetic evaluation of the replacement and prefetching strategies are elaborated in the paper, including multi-core characteristics, information entropy, interplays and the performance constraints between replacement and prefetching mechanism, which would facilitate more credible ideas for future memory inefficiencies management and strategy design.

Memory Centric Hardware Prefetching in Multi-core Processors

Prefetching Techniques for STT-RAM Based Last-Level Cache in CMP Systems

Exploring Data Prefetching Mechanisms for Last Level Cache in Chip Multi-Processors

Exploring DRAM Cache Prefetching for Pooled Memory

Prefetch-directed Scheme for Accelerating Memory Accesses of PCIe-based I/O Subsystem

Poster: revisiting virtual channel memory for performance and fairness on multi-core architecture.

Energy-Efficient Hardware Data Prefetching

Data Cache Prefetching with Perceptron Learning

Memory Affinity: Balancing Performance, Power, Thermal and Fairness for Multi-core Systems

Memory Access Scheduling Based on Dynamic Multilevel Priority in Shared DRAM Systems

Exploration and optimization of novel replacement and prefetching strategies for inefficiencies of advanced MRAM-based hybrid cache systems

Coordinated Page Prefetch and Eviction for Memory Oversubscription Management in GPUs

Software Prefetching for Indirect Memory Accesses

Agent-Based Memory Access for Many-Core CMPs

A Study of Leveraging Memory Level Parallelism for DRAM System on Multi-core/Many-Core Architecture

Puppeteer: A Random Forest-based Manager for Hardware Prefetchers across the Memory Hierarchy

Energy Aware Loop Scheduling for High Performance Multi-Module Memory

Direct Distributed Memory Access for CMPs

Gaze into the Pattern: Characterizing Spatial Patterns with Internal Temporal Correlations for Hardware Prefetching

CDPM: Context-Directed Pattern Matching Prefetching to Improve Coarse-Grained Reconfigurable Array Performance.

Helper Without Threads: Customized Prefetching for Delinquent Irregular Loads