Abstract:The computational capability of a coarse-grained reconfigurable array(CGRA)can be significantly restrained due to data and context memory bandwidth bottlenecks.Traditionally,two methods have been used to resolve this problem.One method loads the context into the CGRA at run time.This method occupies very small on-chip memory but induces very large latency,which leads to low computational efficiency.The other method adopts a multi-context structure.This method loads the context into the on-chip context memory at the boot phase.Broadcasting the pointer of a set of contexts changes the hardware configuration on a cycle-by-cycle basis.The size of the context memory induces a large area overhead in multi-context structures,which results in major restrictions on application complexity.This paper proposes a Predictable Context Cache(PCC)architecture to address the above context issues by buffering the context inside a CGRA.In this architecture,context is dynamically transferred into the CGRA.Utilizing a PCC significantly reduces the on-chip context memory and the complexity of the applications running on the CGRA is no longer restricted by the size of the on-chip context memory.Data preloading is the most frequently used approach to hide input data latency and speed up the data transmission process for the data bandwidth issue.Rather than fundamentally reducing the amount of input data,the transferred data and computations are processed in parallel.However,the data preloading method cannot work efficiently because data transmission becomes the critical path as the reconfigurable array scale increases.This paper also presents a Hierarchical Data Memory(HDM)architecture as a solution to the efficiency problem.In this architecture,high internal bandwidth is provided to buffer both reused input data and intermediate data.The HDM architecture relieves the external memory from the data transfer burden so that the performance is significantly improved.As a result of using PCC and HDM,experiments running mainstream video decoding programs achieved performance improvements of 13.57%–19.48%when there was a reasonable memory size.Therefore,1080p@35.7fps for H.264high profile video decoding can be achieved on PCC and HDM architecture when utilizing a 200 MHz working frequency.Further,the size of the on-chip context memory no longer restricted complex applications,which were efficiently executed on the PCC and HDM architecture.

Cost-Effective Memory Architecture to Achieve Flexible Configuration and Efficient Data Transmission for Coarse-Grained Reconfigurable Array (Abstract Only).

Efficient and Flexible Memory Architecture to Alleviate Data and Context Bandwidth Bottlenecks of Coarse-Grained Reconfigurable Arrays

On-Chip Memory Hierarchy in One Coarse-Grained Reconfigurable Architecture to Compress Memory Space and to Reduce Reconfiguration Time and Data-Reference Time

Configuration Approaches to Enhance Computing Efficiency of Coarse-Grained Reconfigurable Array.

CSA-CiM: Enhancing Multi-Functional Computing-in-Memory with Configurable Sense Amplifiers

Configuration Context Reduction for Coarse-Grained Reconfigurable Architecture

Row-based Configuration Mechanism for a 2-D Processing Element Array in Coarse-Grained Reconfigurable Architecture

An Utilization-Efficient Context Memory Design for Reconfigurable Processing Array with Locally Shared Strategy

Combining Memory Partitioning and Subtask Generation for Parallel Data Access on CGRAs

The Organization Of On-Chip Data Memory In One Coarse-Grained Reconfigurable Architecture

Hierarchical Representation of On-Chip Context to Reduce Reconfiguration Time and Implementation Area for Coarse-Grained Reconfigurable Architecture

Configuration Approaches to Improve Computing Efficiency of Coarse-Grained Reconfigurable Multimedia Processor.

Coarse-grained reconfigurable multimedia processor

MDCRA: A Reconfigurable Accelerator Framework for Multiple Dataflow Lanes

RMP-MEM: A HW/SW Reconfigurable Multi-Port Memory Architecture for Multi-PEA Oriented CGRA.

Memory-Aware Loop Mapping on Coarse-Grained Reconfigurable Architectures

Mixed-granularity Parallel Coarse-Grained Reconfigurable Architecture

Reconfiguration Process Optimization Of Dynamically Coarse Grain Reconfigurable Architecture For Multimedia Applications

Enhancing CGRA Efficiency Through Aligned Compute and Communication Provisioning

DARIC: A Data Reuse-Friendly CGRA for Parallel Data Access via Elastic FIFOs.

A High-Performance Memory Storage Architecture for MPEG2 Decoder