Making GPU Warp Scheduler and Memory Scheduler Synchronization-Aware

Jianliang Ma,Tianzhou Chen,Minghui Wu
DOI: https://doi.org/10.1007/978-3-319-28430-9_12
2015-01-01
Abstract:Modern GPU applications often need to synchronize thousands of threads for correctness. The warp scheduling algorithm, memory coalescing and memory scheduling algorithm etc. may cause different execution schedules for the warps in the same Cooperative Thread Array (CTA). So when synchronization happens, waiting is required and synchronization cost is introduced. In this paper, we examine the synchronization cost of multiple GPU applications in three metrics. With synchronization information in CTA boundary, the warps still running in the CTA can know their lagging degrees. We promote the warp scheduling priority and memory scheduling priority for these warps and their memory requests to accelerate the execution speed of these warps, making warp scheduler and memory scheduler synchronization-aware. The experiments show that the synchronization-aware warp scheduling algorithm reduces the synchronization metrics to 86.66 %, 92.12 % and 85.63 % compared with the baseline and improves the GPU performance by 5.76 %. For memory intensive benchmarks, the synchronization-aware memory scheduling algorithm improves the system performane by 6.81 %. The combination of these two schedulers can further improve the GPU performance by 6.46 %.
What problem does this paper attempt to address?