Abstract:Coarse-grained reconfigurable architecture (CGRA) is a promising programmable hardware with high power-efficiency and high performance. However, compiling and optimizing loops with irregular branches on CGRAs is a challenge to fulfill the performance potential. Existing predication techniques, such as partial predication (PP) and full predication (FP), conservatively implement software pipeline with a static initiation interval (II) obtained from the maximum graph, and thus only parts of the graph in each loop iteration will be actually executed, resulting in underexploited performance. To exploit more loop-level parallelism for irregular branches, this article proposes a novel dynamic-II pipeline (DIP) scheme, which realizes a pipeline with variable II by accommodating multiple iterations of short path in one static configuration. Since the DIP scheme is effective to only certain types of branches, this article designs a hybrid compilation framework integrating other complementary methods, which selects the appropriate method for source programs according to a proposed performance evaluation model. Experimental results show that: 1) the hybrid compilation framework can effectively extract branch features, correctly choose and implement corresponding branch processing methods within acceptable compile time and 2) as compared to PP and FP, DIP brings a significant total execution time (TET) reduction by 27.21% and 22.04% on average when the execution probability of a short branch is 50%.

Joint Affine Transformation and Loop Pipelining for Mapping Nested Loop on CGRAs.

Improving Nested Loop Pipelining on Coarse-Grained Reconfigurable Architectures

Polyhedral-based Pipelining of Imperfectly-Nested Loop for CGRAs

Exploiting Outer Loop Parallelism of Nested Loop on Coarse-Grained Reconfigurable Architectures

Optimizing Spatial Mapping of Nested Loop for Coarse-Grained Reconfigurable Architectures

Low-power loop pipelining mapping onto CGRA utilizing variable dual VDD

Mapping Multi-Level Loop Nests Onto CGRAs Using Polyhedral Optimizations.

Affine Transformations for Communication and Reconfiguration Optimization of Loops on Cgras

MapReduce Inspired Loop Mapping for Coarse-Grained Reconfigurable Architecture

Map-reduce inspired loop parallelization on CGRA

Low-Power Loop Parallelization Onto CGRA Utilizing Variable Dual VDD

Energy-aware Loops Mapping on Multi-Vdd CGRAs Without Performance Degradation

Dynamic-II Pipeline: Compiling Loops with Irregular Branches on Static-Scheduling CGRA

Trigger-Centric Loop Mapping on CGRAs

Memory-Aware Loop Mapping on Coarse-Grained Reconfigurable Architectures

Polyhedral Model Based Mapping Optimization Of Loop Nests For Cgras

Mapping Optimization Of Affine Loop Nests For Reconfigurable Computing Architecture

Stress-Aware Loops Mapping on CGRAs with Considering NBTI Aging Effect.

Cascade: An Application Pipelining Toolkit for Coarse-Grained Reconfigurable Arrays

GEML: GNN-based efficient mapping method for large loop applications on CGRA

Joint Modulo Scheduling and Vdd Assignment for Loop Mapping on Dual- Vdd CGRAs.