Abstract:Coarse-Grained Reconfigurable Architectures (CGRA) is a promising solution for accelerating computation intensive tasks due to its good trade-off in energy efficiency and flexibility. One of the challenging research topic is how to effectively deploy loops onto CGRAs within acceptable compilation time. Modulo scheduling (MS) has shown to be efficient on deploying loops onto CGRAs. Existing CGRA MS algorithms still suffer from the challenge of mapping loop with higher performance under acceptable compilation time, especially mapping large and irregular loops onto CGRAs with limited computational and routing resources. This is mainly due to the under utilization of the available buffer resources on CGRA, unawareness of critical mapping constraints and time consuming method of solving temporal and spatial mapping. This article focus on improving the performance and compilation robustness of the modulo scheduling mapping algorithm for CGRAs. We decomposes the CGRA MS problem into the temporal and spatial mapping problem and reorganize the processes inside these two problems. For the temporal mapping problem, we provide a comprehensive and systematic mapping flow that includes a powerful buffer allocation algorithm, and efficient interconnection & computational constraints solving algorithms. For the spatial mapping problem, we develop a fast and stable spatial mapping algorithm with backtracking and reordering mechanism. Our MS mapping algorithm is able to map loops onto CGRA with higher performance and faster compilation time. Experiment results show that given the same compilation time budget, our mapping algorithm generates higher compilation success rate. Among the successfully compiled loops, our approach can improve 5.4 to 14.2 percent performance and takes x24 to x1099 less compilation time in average comparing with state-of-the-art CGRA mapping algorithms.

Towards Higher Performance and Robust Compilation for CGRA Modulo Scheduling.

Routability-Enhanced Scheduling for Application Mapping on CGRAs

SAT-based Exact Modulo Scheduling Mapping for Resource-Constrained CGRAs

Joint Modulo Scheduling and Vdd Assignment for Loop Mapping on Dual- Vdd CGRAs.

Energy-aware Loops Mapping on Multi-Vdd CGRAs Without Performance Degradation

CREPE: Concurrent Reverse-Modulo-Scheduling and Placement for CGRAs

Memory-Aware Loop Mapping on Coarse-Grained Reconfigurable Architectures

Joint Modulo Scheduling and $v_{\mathrm{ Dd}}$ Assignment for Loop Mapping on Dual- $v_{\mathrm{ Dd}}$ CGRAs

Optimizing Spatial Mapping of Nested Loop for Coarse-Grained Reconfigurable Architectures

Polyhedral-based Pipelining of Imperfectly-Nested Loop for CGRAs

Low-Power Loop Parallelization Onto CGRA Utilizing Variable Dual VDD

Map-reduce inspired loop parallelization on CGRA

Stress-Aware Loops Mapping on CGRAs with Dynamic Multi-Map Reconfiguration

GEML: GNN-based efficient mapping method for large loop applications on CGRA

Low-power loop pipelining mapping onto CGRA utilizing variable dual VDD

An Elastic Task Scheduling Scheme on Coarse-Grained Reconfigurable Architectures

MapReduce Inspired Loop Mapping for Coarse-Grained Reconfigurable Architecture

Mixed-granularity Parallel Coarse-Grained Reconfigurable Architecture

Mapping Multi-Level Loop Nests Onto CGRAs Using Polyhedral Optimizations.

Combining Memory Partitioning and Subtask Generation for Parallel Data Access on CGRAs

Formulating Data-arrival Synchronizers in Integer Linear Programming for CGRA Mapping