Last Level Cache Layout Remapping for Heterogeneous Systems

Licheng Yu,Tianzhou Chen,Minghui Wu,Xueqing Lou
DOI: https://doi.org/10.1016/j.sysarc.2018.05.002
IF: 5.836
2018-01-01
Journal of Systems Architecture
Abstract:Heterogeneous systems with CPU and GPGPU sharing the last level cache (LLC) provide viability and flexibility. However, the different programming models lead to conflicting memory layouts, which are required for best performance of different processors. Software converting that directly accesses target layout is subject to suboptimal localities. Converting in GPGPU shared memory also incurs copying and synchronization overhead. In this paper, we analyze the memory layout requirement and propose to remap the memory layout in the shared LLC. A remap controller in LLC executes a simple program that calculates target requests from an LLC request in the source memory space. The LLC request is thus remapped to the target memory space with the generated requests. Consequently, all processors always access memory in their optimal data layouts. The locality is thus kept through all the private caches, and software remapping overhead is also eliminated. The tiled-matrix multiplication is discussed as a case study and benchmarks from Polybench/GPU and Rodinia are modified to take advantage of the LLC layout remapping. The experiment results show the average benchmark execution time is decreased to 69%. Compared with CPU software layout converting, the CPU time is decreased to 41%-73%.
What problem does this paper attempt to address?