Analyzing Memory Access on CPU-GPGPU Shared LLC Architecture

Jianliang Ma,Licheng Yu,Tianzhou Chen,Minghui Wu
DOI: https://doi.org/10.1109/ISPDC.2015.18
2015-01-01
Abstract:The data exchange between GPGPUs and CPUs are becoming more and more important nowadays. One trend in industry to alleviate the long latency is to integrate CPUs and GPGPUs on a single chip. In this paper, we analyze the reference interactions between CPU and GPGPU applications with a CPU-GPGPU co-simulator that integrates the gem5 and gpgpu-sim together. Since the memory controllers are shared among all cores, we observe severe memory contention between them. The CPU applications suffer a 1.26x slowdown and 64.79% blocked time in main memory when they run parallels with GPGPU applications. To alleviate the contention and provide more memory band-width, shared last level caches (LLCs) are commonly employed in such systems. We test a banked shared LLC structure that implanted into the co-simulator. We show that a simple shared LLC contributes mostly to the GPGPU (2.13x to running alone and 1.7x to running in parallel), rather than CPU. With the help of LLC, the memory requests issued to main memory is reduced to 30.74%, the blocked time is reduced to 49.64%, which provides more memory bandwidth. The latency-sensitive CPU applications are suffered as the LLC buffer occupation is very high when they run with GPGPU in parallel. Besides, as the number of LLC cache bank grows, we reveal that CPU achieves higher speedup than GPGPUs by increasing LLC parallelism. Finally, we also discuss the impact of GPGPU L2 cache. And we find that fewer GPGPU L2 cache banks will lower the performance as they limits the parallelism of GPGPU. The observations and inferences in this paper may serve as a reference guide to future CPU-GPGPU shared LLC design.
What problem does this paper attempt to address?