Compile-Time Automatic Synchronization Insertion and Redundant Synchronization Elimination for GPU Kernels.

Lifeng Liu,Meilin Liu,Chongjun Wang,Jun Wang
DOI: https://doi.org/10.1109/icpads.2016.0112
2016-01-01
Abstract:In most of the GPU kernel programs, the synchronization statements are inserted manually by the programmers, which is very labor intensive, and error-prone. In this paper, we propose a synchronization optimization framework to automatically insert synchronization statements into the GPU kernels at compile time, while eliminating the redundant synchronization statements. We have shown that our framework can not only insert the synchronizations correctly, but also eliminate the redundant synchronizations, which outperforms the existing compiler frameworks that introduce redundant synchronizations using the most conservative strategy. Taking the GPU kernels as the input, our framework leverages data dependence analysis to insert synchronizations. We extend CETUS, a source-to-source compiler framework, to implement our synchronization optimization framework. Experimental results show that our proposed framework achieved 100% correctness by combining extensive evaluation and manual comparison. In addition, the number of synchronization statements in GPU kernels is reduced by 32.5%, and the number of synchronization statements executed is reduced by 28.2% on average by our synchronization optimization framework compared to the original GPU kernels.
What problem does this paper attempt to address?