Improving the Efficiency of GPGPU Work-Queue Through Data Awareness.

Libo Huang,Yashuai Lü,Li Shen,Zhiying Wang
DOI: https://doi.org/10.1145/3151035
2017-01-01
Abstract:The architecture and programming model of current GPGPUs are best suited for applications that are dominated by structured control and data flows across large regular datasets. Parallel workloads with irregular control and data structures cannot easily harness the processing power of the GPGPU. One approach for mapping these irregular-parallel workloads to GPGPUs is using work-queues. The work-queue approach improves the utilization of SIMD units by only processing useful works that are dynamically generated during execution. As current GPGPUs lack necessary supports for work-queues, a software-based work-queue implementation often suffers from memory contention and load balancing issues. In this article, we present a novel hardware work-queue design named DaQueue, which incorporates three data-aware features to improve the efficiency of work-queues on GPGPUs. We evaluate our proposal on the irregular-parallel workloads and carry out a case study on a path tracing pipeline with a cycle-level simulator. Experimental results show that for the tested workloads, DaQueue improves performance by 1.53× on average and up to 1.91×. Compared to a hardware worklist approach that is the state-of-the-art prior work, DaQueue can achieve an average of 33.92% extra speedup with less hardware area cost.
What problem does this paper attempt to address?