LLC Buffer for Arbitrary Data Sharing in Heterogeneous Systems.

Yu Licheng,Pei Yulong,Chen Tianzhou,Lou Xueqing,Wu Minghui,Zhang Tiefei
DOI: https://doi.org/10.1109/hpcc-smartcity-dss.2016.0046
2016-01-01
Abstract:Closely coupled CPU and GPGPU system with the shared last level cache (LLC) enables fine-grained data exchange. However, traditional data exchange causes unnecessary LLC misses and degrades the entire system performance. We believe that the cache organization is not suitable for the temporary data exchange in the closely coupled system. We analyze the memory access pattern and discover the inefficiency data exchange. When the exchanged data cannot fit in the LLC, the low LLC hit rate exacerbates core stalls and memory contention. We also show that the stalls cannot be entirely covered by increasing the compute load or parallelism. In previous work, a simple LLC buffer is introduced to replace the cache with an architecture-supported data queue. However, the simple design limits the data element size and requires a potentially very large storage for pending requests. In this paper, we propose an improved LLC buffer. It adopts element-atom data organization to enable data exchange of arbitrary size. A simple hardware-software collaborated protocol is adopted, and eliminates the pending requests. The experiment results reveal that it has an average speedup of 48.2% compared with the traditional way, but incurs a 7.5% slowdown compared with the simple LLC buffer due to the overhead of the protocol. We also compare it with the fine-grain task, which implements a data exchange channel between CPU and GPGPU. The results show that the improved LLC buffer has less storage overhead but higher access efficiency than the fine-grain task.
What problem does this paper attempt to address?