Abstract:The major bottleneck of coarse-grained reconfigurable arrays (CGRAs) is the excessive configuration overhead; as a result, computing potential cannot be fully utilized. At run-time, the function of CGRAs can be fully and dynamically reconfigured by changing contexts. Therefore, the frequency of context switching on CGRAs is very high. On the other hand, the configuration time of CGRAs is very long. This paper proposes three configuration approaches to reduce interval latency when switching configuration contexts. These proposed approaches include input data relocation (IDR), line-based context switching (LCS), and loop interval minimization (LIM). IDR relocates input data to the first stage of the pipeline; as a result, the delay time for the input data of the next data flow graph (DFG) is reduced. LCS is a LCS mechanism for adjacent independent DFGs to reduce the interval of context switching, thereby expanding the depth of the pipeline. LIM is used to minimize the interval of loops. Simulations on a coarse-grained reconfigurable processor called reconfigurable multimedia system (REMUS) show that 1080 p@30 fps for H.264 high profile video decoding can be achieved under 200 MHz working frequency. As for AVS and MPEG2 decoding algorithms, much higher performance, i.e., 1080 p@39 fps and 1080 p@41 fps, can be achieved respectively.

Affine Transformations for Communication and Reconfiguration Optimization of Loops on Cgras

Optimizing Spatial Mapping of Nested Loop for Coarse-Grained Reconfigurable Architectures

Mapping Optimization Of Affine Loop Nests For Reconfigurable Computing Architecture

Joint Affine Transformation and Loop Pipelining for Mapping Nested Loop on CGRAs.

Improving Nested Loop Pipelining on Coarse-Grained Reconfigurable Architectures

Mapping Multi-Level Loop Nests Onto CGRAs Using Polyhedral Optimizations.

Polyhedral-based Pipelining of Imperfectly-Nested Loop for CGRAs

MapReduce Inspired Loop Mapping for Coarse-Grained Reconfigurable Architecture

Configuration Approaches to Enhance Computing Efficiency of Coarse-Grained Reconfigurable Array.

Low-Power Loop Parallelization Onto CGRA Utilizing Variable Dual VDD

Polyhedral Model Based Mapping Optimization Of Loop Nests For Cgras

Low-power loop pipelining mapping onto CGRA utilizing variable dual VDD

Map-reduce inspired loop parallelization on CGRA

Memory-Aware Loop Mapping on Coarse-Grained Reconfigurable Architectures

A Dynamic Partial Reconfigurable CGRA Framework for Multi-Kernel Applications

Exploiting Outer Loop Parallelism of Nested Loop on Coarse-Grained Reconfigurable Architectures

Configuration Context Reduction for Coarse-Grained Reconfigurable Architecture

Trigger-Centric Loop Mapping on CGRAs

A Survey of Coarse-Grained Reconfigurable Architecture and Design

Mixed-granularity Parallel Coarse-Grained Reconfigurable Architecture

A Survey of Coarse-Grained Reconfigurable Architecture and Design: Taxonomy, Challenges, and Applications