Abstract:Shared last level cache has been widely used in modern multicore processors. However, uncontrolled cache sharing on multicore leads to more serious cache pollution than that on single-core processor. A process with weak locality can evict strong locality data sets that belong to other concurrent ones. Processes in multiprocessing environment always affect each other on multicore systems with shared last level cache. Prior approaches either partition shared cache in process level to reduce inter-process cache contention, or isolate the non-temporal memory accesses in order to accelerate single application execution. Process-based cache partitioning may make intra-process cache pollution more serious and have great impact on single process performance. In this work, we take an alternative view to explore physical page layout optimization by combining process-based cache partitioning and pollute region isolation for improving the shared last level cache utilization on multicore systems. Our proposed approach includes three steps. The first step determines the cache sizes of co-scheduled applications and the second step recognizes weak-locality regions of each application on different cache size configurations. Lastly, the third step customizes the physical page layout to partition cache space among concurrent processes and set up global pollute buffer for mapping pollute regions into a small slice of shared last level cache. Our approach is directly used in commercial multicore systems without any additional hardware requirement. Our experimental results show that in comparison with default Linux memory management scheme, our approach improves performance by 26.73% on average. Even compared to the process-based cache partitioning RapidMRC, our approach further eliminates the harmful effect of non-reusable data, and system performance is also improved by 5.63% on average.

Last Level Cache Layout Remapping for Heterogeneous Systems

Analyzing Memory Access on CPU-GPGPU Shared LLC Architecture

Improve Llc Bypassing Performance By Memory Controller Improvements In Heterogeneous Multicore System

LLC Buffer for Arbitrary Data Sharing in Heterogeneous Systems.

WAP: the Warp Feature Aware Prefetching Method for LLC on CPU-GPU Heterogeneous Architecture

Exploring Time-Predictable and High-Performance Last-Level Caches for Hard Real-Time Integrated CPU-GPU Processors.

Buffer on Last Level Cache for CPU and GPGPU Data Sharing

Enable Back Memory and Global Synchronization on LLC Buffer

Memory-Aware Loop Mapping on Coarse-Grained Reconfigurable Architectures

Orchestrating Cache Management and Memory Scheduling for GPGPU Applications.

LA-LLC: Inter-Core Locality-Aware Last-Level Cache to Exploit Many-to-Many Traffic in GPGPUs

Re-Cache: Mitigating Cache Contention by Exploiting Locality Characteristics with Reconfigurable Memory Hierarchy for GPGPUs.

An Energy-Efficient Last Level Cache Design Using Software and Hardware Collaborative Region-based Management

Predictable Sharing of Last-level Cache Partitions for Multi-core Safety-critical Systems

Adaptive Placement and Migration Policy for an STT-RAM-based Hybrid Cache

Combining Process-Based Cache Partitioning and Pollute Region Isolation to Improve Shared Last Level Cache Utilization on Multicore Systems

Dynamically Reconfigurable Memory Address Mapping for General-Purpose Graphics Processing Unit.

Improving Cache Partitioning Algorithms For Pseudo-Lru Policies

LLAMA: The Low-Level Abstraction For Memory Access

A Compiler-assisted Locality Aware CTA Mapping Scheme

PhantomCache: Obfuscating Cache Conflicts with Localized Randomization