Fast Schedule Tensor Computation on GPU with High Data Reuse and Device Utilization

Yuxiang Zhang,Yu Zhang
DOI: https://doi.org/10.1109/ispa-bdcloud-sustaincom-socialcom48970.2019.00084
2019-01-01
Abstract:Tensor computation, or computation on high-dimensional arrays, is widely used in deep learning, image processing, and scientific computation. And GPU has become the mainstream platform to accelerate computing. We propose an algorithm which can efficiently find a promising schedule to exploit the parallelism and locality of computation on GPU. In particular, an empirical model comprehensively considering locality, load balance and parallelism sufficiency of computation on given GPU model is designed to measure the quality of a candidate schedule. And empirical constraints are introduced to significantly reduce the searching space of schedule to polynomial complexity in terms of computation dimensions. Compared with the state-of-the-art tool, Tensor Comprehensions, our algorithm can find a promising schedule 5-45x faster, and the corresponding scheduled code runs 1.5-10x faster.
What problem does this paper attempt to address?