Improve GPGPU Latency Hiding with a Hybrid Recovery Stack and a Window Based Warp Scheduling Policy.

Tianzhou Chen,Xingsheng Tang,Licheng Yu,Jianliang Ma,Minghui Wu
DOI: https://doi.org/10.1109/hpcc.2012.190
2012-01-01
Abstract:Branch divergence phenomenon usually has very serious impact on SIMD pipeline's efficiency. However Dynamic Warp Subdivision's branch method utilizes the branch divergence phenomenon to hide memory latency by interleaving issue among all branch paths of a warp. But this method may experience serious over-subdivision problem. So, we propose a hybrid stack mechanism that enables the PDOM stack can issue any ready sub-warps without losing the logical structure of PDOM stack. To maximize our hybrid stack's potential we propose a window based scheduling policy to reinforce the memory latency hiding. The experiment result shows that our window based scheduling policy and the hybrid stack hardware's combination can improve the performance by 10% compared with the baseline configuration with PDOM loose round-robin method and 6.8% over DWS-PC with our window based scheduling policy in our selected 7 benchmark programs.
What problem does this paper attempt to address?