Hybrid Static/Dynamic Schedules for Tiled Polyhedral Programs
Tian Jin,Nirmal Prajapati,Waruna Ranasinghe,Guillaume Iooss,Yun Zou,Sanjay Rajopadhye,David Wonnacott
DOI: https://doi.org/10.48550/arXiv.1610.07236
2016-10-24
Abstract:Polyhedral compilers perform optimizations such as tiling and parallelization; when doing both, they usually generate code that executes "barrier-synchronized wavefronts" of tiles. We present a system to express and generate code for hybrid schedules, where some constraints are automatically satisfied through the structure of the code, and the remainder are dynamically enforced at run-time with data flow mechanisms. We prove bounds on the added overheads that are better, by at least one polynomial degree, than those of previous techniques.
We propose a generic mechanism to implement the needed synchronization, and show it can be easily realized for a variety of targets: OpenMP, Pthreads, GPU (CUDA or OpenCL) code, languages like X10, Habanero, Cilk, as well as data flow platforms like DAGuE, and OpenStream and MPI. We also provide a simple concrete implementation that works without the need of any sophisticated run-time mechanism.
Our experiments show our simple implementation to be competitive or better than the wavefront-synchronized code generated by other systems. We also show how the proposed mechanism can achieve 24% to 70% reduction in energy.
Programming Languages,Distributed, Parallel, and Cluster Computing,Performance