Abstract:The coarse-grained reconfigurable architecture (CGRA) is proven to be energy efficient in several specific domains. In CGRAs, the on-chip memory hierarchy, which contains the context memory and the data memory organizations, should be well considered to achieve appropriate tradeoffs among three aspects: 1) performance; 2) area; and 3) power. In this paper, two techniques called the hierarchical configuration context (HCC) and the lifetime-based data-memory organization (LDO) focusing on the context memory and the data memory organizations are proposed to compress the on-chip memory space and to reduce the reconfiguration time and the data-reference time. In the HCC, the contexts are constructed in a hierarchical fashion to completely eliminate the repetitive portions of the contexts, not only reducing the overall context storage, but also alleviating the context transportation overhead. A fast context-indexing mechanism in the HCC is proposed to achieve fast reconfiguration, as the hierarchically organized contexts can be located and accessed conveniently. In the LDO, the on-chip data are classified into two types, based on the lifetime of data. The short-lifetime data are stored in the first in first out to increase the reuse ratio of memory space automatically, whereas the long-lifetime data are stored in the radom access memory for several time references. The HCC and the LDO are used in a CGRA core called as reconfigurable processing unit (RPU). Two RPUs are integrated in a reconfigurable computing processor (RCP) called as REconfigurable MUlti-media System, High-Performance Processor (REMUS_HPP). Because of the HCC, compared with a traditional nonhierarchical system, the total context storage required in H.264 decoding is reduced by 77%. Because of the LDO, the normalized on-chip data memory size at same performance level in the REMUS_HPP is only 23.8% and 14.8% of those in XPP-III (a high-performance RCP) and ADRES (a low-power RCP). REMUS_HPP is implemented on a 48.9-mm 2 silicon with TSMC 65-nm technology, using a 200-MHz working frequency to achieve 1920 × 1088 at 30 fps H.264 high-profile decoding. Compared with XPP-III, the performance of the REMUS_HPP is 1.81× boosted, whereas the energy efficiency is 4.75× higher.

Using System Hyper Pipelining (SHP) to Improve the Performance of a Coarse-Grained Reconfigurable Architecture (CGRA) Mapped on an FPGA

Design of Hardware Pipelining Processor

Scalable-Grain Pipeline Parallelization Method For Multi-Core Systems

Cascade: An Application Pipelining Toolkit for Coarse-Grained Reconfigurable Arrays

Improving Nested Loop Pipelining on Coarse-Grained Reconfigurable Architectures

Exploiting Outer Loop Parallelism of Nested Loop on Coarse-Grained Reconfigurable Architectures

Aggressive Pipelining of Irregular Applications on Reconfigurable Hardware

Polyhedral-based Pipelining of Imperfectly-Nested Loop for CGRAs

Automatic multi-dimensional pipelining for high-level synthesis of dataflow accelerators

Low-power loop pipelining mapping onto CGRA utilizing variable dual VDD

Mixed-granularity Parallel Coarse-Grained Reconfigurable Architecture

Dynamic-II Pipeline: Compiling Loops with Irregular Branches on Static-Scheduling CGRA

Hardware-Software Co-Design Flow for Embedded Coarse-Grained Reconfigurable Processor

System Level Asynchronous Virtual Pipeline on Dynamically and Partially Reconfigurable Architecture

Joint Affine Transformation and Loop Pipelining for Mapping Nested Loop on CGRAs.

Configuration Approaches to Enhance Computing Efficiency of Coarse-Grained Reconfigurable Array.

Low-Power Loop Parallelization Onto CGRA Utilizing Variable Dual VDD

Application of Pipeline Reconfiguration Technique in Reconfigurable Processor

A Dynamic Partial Reconfigurable CGRA Framework for Multi-Kernel Applications

On-Chip Memory Hierarchy in One Coarse-Grained Reconfigurable Architecture to Compress Memory Space and to Reduce Reconfiguration Time and Data-Reference Time

A Coarse-Grained Reconfigurable Architecture for Compute-Intensive MapReduce Acceleration