CREPE: Concurrent Reverse-Modulo-Scheduling and Placement for CGRAs
Chilankamol Sunny,Satyajit Das,Kevin J. M. Martin,Philippe Coussy
DOI: https://doi.org/10.1109/tpds.2024.3402098
IF: 5.3
2024-06-04
IEEE Transactions on Parallel and Distributed Systems
Abstract:Coarse-Grained Reconfigurable Array (CGRA) architectures are popular as high-performance and energy-efficient computing devices. Compute-intensive loop constructs of complex applications are mapped onto CGRAs by modulo-scheduling the innermost loop dataflow graph (DFG). In the state-of-the-art approaches, mapping quality is typically determined by initiation interval (II), while schedule length for one iteration is neglected. However, for nested loops, schedule length becomes important. In this article, we propose CREPE, a Concurrent Reverse-modulo-scheduling and Placement technique for CGRAs that minimizes both II and schedule length. CREPE performs simultaneous modulo-scheduling and placement coupled with dynamic graph transformations, generating good-quality mappings with high success rates. Furthermore, we introduce a compilation flow that maps nested loops onto the CGRA and modulo-schedules the innermost loop using CREPE. Experiments show that the proposed solution outperforms the conventional approaches in mapping success rate and total execution time with no impact on the compilation time. CREPE maps all kernels considered while state-of-the-art techniques Crimson and Epimap failed to find a mapping or mapped at very high IIs. On a 2×4 CGRA, CREPE reports a 100% success rate and a speed-up up to 5.9× and 1.4× over Crimson with 78.5% and Epimap with 46.4% success rates respectively.
computer science, theory & methods,engineering, electrical & electronic