A Graph-based Model for GPU Caching Problems

Lingda Li,Ari B. Hayes,Stephen A. Hackler,Eddy Z. Zhang,Mario Szegedy,Shuaiwen Leon Song
DOI: https://doi.org/10.48550/arXiv.1605.02043
2016-05-07
Abstract:Modeling data sharing in GPU programs is a challenging task because of the massive parallelism and complex data sharing patterns provided by GPU architectures. Better GPU caching efficiency can be achieved through careful task scheduling among different threads. Traditionally, in the field of parallel computing, graph partition models are used to model data communication and guide task scheduling. However, we discover that the previous methods are either inaccurate or expensive when applied to GPU programs. In this paper, we propose a novel task partition model that is accurate and gives rise to the development of fast and high quality task/data reorganization algorithms. We demonstrate the effectiveness of the proposed model by rigorous theoretical analysis of the algorithm bounds and extensive experimental analysis. The experimental results show that it achieves significant performance improvement across a representative set of GPU applications.
Distributed, Parallel, and Cluster Computing,Programming Languages
What problem does this paper attempt to address?