Abstract:Coarse-grained reconfigurable architecture (CGRA) is a promising programmable hardware with high power-efficiency and high performance. However, compiling and optimizing loops with irregular branches on CGRAs is a challenge to fulfill the performance potential. Existing predication techniques, such as partial predication (PP) and full predication (FP), conservatively implement software pipeline with a static initiation interval (II) obtained from the maximum graph, and thus only parts of the graph in each loop iteration will be actually executed, resulting in underexploited performance. To exploit more loop-level parallelism for irregular branches, this article proposes a novel dynamic-II pipeline (DIP) scheme, which realizes a pipeline with variable II by accommodating multiple iterations of short path in one static configuration. Since the DIP scheme is effective to only certain types of branches, this article designs a hybrid compilation framework integrating other complementary methods, which selects the appropriate method for source programs according to a proposed performance evaluation model. Experimental results show that: 1) the hybrid compilation framework can effectively extract branch features, correctly choose and implement corresponding branch processing methods within acceptable compile time and 2) as compared to PP and FP, DIP brings a significant total execution time (TET) reduction by 27.21% and 22.04% on average when the execution probability of a short branch is 50%.

Data parallelism optimization for the CGRA loop pipelining mapping

Polyhedral-based Pipelining of Imperfectly-Nested Loop for CGRAs

Joint Affine Transformation and Loop Pipelining for Mapping Nested Loop on CGRAs.

Low-Power Loop Parallelization Onto CGRA Utilizing Variable Dual VDD

Scalable-Grain Pipeline Parallelization Method For Multi-Core Systems

Improving Nested Loop Pipelining on Coarse-Grained Reconfigurable Architectures

Exploiting Outer Loop Parallelism of Nested Loop on Coarse-Grained Reconfigurable Architectures

Low-power loop pipelining mapping onto CGRA utilizing variable dual VDD

Optimizing Spatial Mapping of Nested Loop for Coarse-Grained Reconfigurable Architectures

Mapping Multi-Level Loop Nests Onto CGRAs Using Polyhedral Optimizations.

Memory-Aware Loop Mapping on Coarse-Grained Reconfigurable Architectures

Map-reduce inspired loop parallelization on CGRA

Energy-aware Loops Mapping on Multi-Vdd CGRAs Without Performance Degradation

Combining Memory Partitioning and Subtask Generation for Parallel Data Access on CGRAs

Polyhedral Model Based Mapping Optimization Of Loop Nests For Cgras

TAEM 2.0: A Faster Transfer-Aware Effective Loop Mapping for Heterogeneous Resources on CGRA.

Dynamic-II Pipeline: Compiling Loops with Irregular Branches on Static-Scheduling CGRA

MapReduce Inspired Loop Mapping for Coarse-Grained Reconfigurable Architecture

Mixed-granularity Parallel Coarse-Grained Reconfigurable Architecture

A Dynamic Partial Reconfigurable CGRA Framework for Multi-Kernel Applications

GEML: GNN-based efficient mapping method for large loop applications on CGRA