Improving Inter-kernel Data Reuse With CTA-Page Coordination in GPGPU

Xuanyi Li,Chen Li,Yang Guo,Rachata Ausavarungnirun
DOI: https://doi.org/10.1109/ICCAD51958.2021.9643535
2021-01-01
Abstract:Although modern GPUs are equipped with expanding memory, accommodating the entire working set of large-scale workloads can still be a challenge. With the support of unified virtual memory and demand paging, programmers can transparently oversubscribe the main memory. However, this transparent management still comes at a severe performance cost, especially for applications with inter-kernel data sharing. While there have been many efforts to reduce additional data migrations caused by the memory oversubscription, few consider the reuse of shared data during the boundary of adjacent kernels. Due to limited memory capacity, we observe that adjacent kernel often demands shared pages that were evicted by the previous kernel, resulting in a significant number of costly data migrations. In this paper, we propose a CTA-Page collaborative framework, called CPC, that transparently reduces the impact of memory oversubscription using CTA dispatch switching and page replacement switching coordinately to reuse inter-kernel shared data. We evaluate CPC with a variety of GPGPU benchmark suites. Experimental results show that the system performance is improved by 65% compared with the state-of-the-art technique for applications with inter-kernel data sharing.
What problem does this paper attempt to address?