Exploiting Outer Loop Parallelism of Nested Loop on Coarse-Grained Reconfigurable Architectures

Xinhan Lin,Shouyi Yin,Leibo Liu,Shaojun Wei
DOI: https://doi.org/10.1109/tpds.2016.2531678
IF: 5.3
2016-01-01
IEEE Transactions on Parallel and Distributed Systems
Abstract:Coarse-grained reconfigurable architecture (CGRA) is a promising platform for loop acceleration, but existing software pipelining methods cannot achieve satisfactory performance on a fair number of imperfect nested loops, especially those with sibling inner loops. To tackle this problem, this paper makes 2 contributions: 1) a 2-level pipelining method with an effective II optimization strategy for the imperfect loops with sibling inner loops; 2) a novel kernel compression method to reduce oversize kernel. Experiment results show that our approach can achieve much higher performance than the state-of-the-art approaches at acceptable costs.
What problem does this paper attempt to address?