Optimizing Spatial Mapping of Nested Loop for Coarse-Grained Reconfigurable Architectures

dajiang liu,shouyi yin,yu peng,leibo liu,shaojun wei
DOI: https://doi.org/10.1109/TVLSI.2014.2371854
2015-01-01
Abstract:Coarse-grained reconfigurable architectures (CGRAs) have drawn increasing attention due to their flexibility and efficiency. Loops in applications are often mapped onto CGRAs for acceleration, and the mapping of loops onto CGRA is quite a challenging work due to the parallel execution paradigm and constrained hardware resource. To map loops onto CGRAs efficiently, it is important to transform loops into pieces that obey hardware resource constraints with less overhead (e.g., communication and configuration overhead). In this paper, we tackle this problem by establishing a performance optimization problem, including loop transformation and back-end placing and routing. A novel searching strategy is also designed to find the optimal result efficiently. Finally, we built a complete flow of mapping loop nests onto CGRA. Experiment results on most kernels of the Polybench show that our proposed approach can improve the performance of the kernels by 42% on average, as compared with the state-of-the-art methods. The runtime complexity of our approach is also acceptable.
What problem does this paper attempt to address?