SPGPU: Spatially Programmed GPU

Shizhuo Zhu,Illia Shkirko,Jacob Levinson,Zhengrong Wang,Tony Nowatzki
DOI: https://doi.org/10.1109/lca.2024.3499339
IF: 2.3
2024-11-29
IEEE Computer Architecture Letters
Abstract:Communication is a critical bottleneck for GPUs, manifesting as energy and performance overheads due to network-on-chip (NoC) delay and congestion. While many algorithms exhibit locality among thread blocks and accessed data, modern GPUs lack the interface to exploit this locality: GPU thread blocks are mapped to cores obliviously. In this work, we explore a simple extension to the conventional GPU programming interface to enable control over the spatial placement of data and threads, yielding new opportunities for aggressive locality optimizations within a GPU kernel. Across 7 workloads that can take advantage of these optimizations, for a 32 (or 128) SM GPU: we achieve a 1.28× (1.54×) speedup and 35% (44%) reduction in NoC traffic, compared to baseline non-spatial GPUs.
computer science, hardware & architecture
What problem does this paper attempt to address?