Affine Transformations for Communication and Reconfiguration Optimization of Loops on Cgras

Shouyi Yin,Dajiang Liu,Leibo Liu,Shaojun Wei
DOI: https://doi.org/10.1587/transinf.e96.d.1582
2013-01-01
IEICE Transactions on Information and Systems
Abstract:A coarse-grained reconfigurable architecture (CGRA) is typically a hybrid architecture, which is composed of a reconfigurable processing unit (RPU) and a host microprocessor. Many compute-intensive applications (e.g., loop nests) are often mapped onto RPUs to speed up the execution of programs. However, communication volume and reconfiguration cost are two bottlenecks for the performance of RPUs. Therefore, loop transformations to break through the bottlenecks and tap the potentials of RPU would be of much significance. In this paper, an automatic loop transformation approach for RPUs is proposed, where the communication cost and reconfiguration cost are under a joint consideration. Experimental results show that our scheme can save up to 22.7% of execution time on average on partial differential equation (PDE) solver kernels compared with the approach just considering communication cost, and performs much better than the loop unrolling scheme on a great majority of loop kernels. Also, run-time complexity is acceptable for the practical cases.
What problem does this paper attempt to address?