Abstract:Pipelining is an effective technique to improve the performance of a loop by overlapping the execution of several iterations, particularly on the reconfigurable platform, which is more coarse-grained. In this paper, we use reconfigurable platform to accelerate loop based applications by reconstructing the pipeline structure during the execution of application. Based on this concept, the optimized strategies such as duplexing and splitting of function unit are applied from instruction level to task level. First, a loop is abstracted as a weighted data flow graph (WDFG), where nodes represent tasks while edges stand for inter-task dependencies. The weights of nodes and edges indicate task execution times and communication overheads respectively. Based on the abstraction, we propose an algorithm which automatically maps the pipelined loops onto reconfigurable hardware and select whether the duplexing or splitting is more appropriate. The algorithm is based on profiling information of WDFG, such as execution times and communication overheads. Then several test cases from EEMBC benchmark are selected to evaluate our approach. The evaluation is demonstrated in two ways. First, we operate some software simulations to appraise the effectiveness of the algorithms. Second, a prototype system is implemented on state-of-the-art FPGA board to evaluate the practicability of our approach on reconfigurable platform. Performance indicators of pipeline such as speedup, throughput and efficiency are measured in both ways. Moreover, in software simulation, the speedup and throughput rate of optimized pipeline achieved to 2 times at least and the efficiency increased by 1.1-1.5 times, whilst in hardware platform, the speedup and efficiency increase by 1.5 times due to the communication cost and reconfiguration delay, the throughput rate also increases by 1.5 to 2 times. Experimental results demonstrate that our approach can achieve satisfactory performance both on effectiveness and practicality.

Exploiting Outer Loop Parallelism of Nested Loop on Coarse-Grained Reconfigurable Architectures

Improving Nested Loop Pipelining on Coarse-Grained Reconfigurable Architectures

Joint Affine Transformation and Loop Pipelining for Mapping Nested Loop on CGRAs.

Polyhedral-based Pipelining of Imperfectly-Nested Loop for CGRAs

Optimizing Spatial Mapping of Nested Loop for Coarse-Grained Reconfigurable Architectures

Scalable-Grain Pipeline Parallelization Method For Multi-Core Systems

Low-Power Loop Parallelization Onto CGRA Utilizing Variable Dual VDD

Dynamic-II Pipeline: Compiling Loops with Irregular Branches on Static-Scheduling CGRA

MapReduce Inspired Loop Mapping for Coarse-Grained Reconfigurable Architecture

Low-power loop pipelining mapping onto CGRA utilizing variable dual VDD

Map-reduce inspired loop parallelization on CGRA

Mapping Multi-Level Loop Nests Onto CGRAs Using Polyhedral Optimizations.

Using System Hyper Pipelining (SHP) to Improve the Performance of a Coarse-Grained Reconfigurable Architecture (CGRA) Mapped on an FPGA

Memory-Aware Loop Mapping on Coarse-Grained Reconfigurable Architectures

Affine Transformations for Communication and Reconfiguration Optimization of Loops on Cgras

Polyhedral Model Based Mapping Optimization Of Loop Nests For Cgras

Automatic Loop-Based Pipeline Optimization on Reconfigurable Platform

GEML: GNN-based efficient mapping method for large loop applications on CGRA

Pipeline optimization for loops on reconfigurable platform

Mixed-granularity Parallel Coarse-Grained Reconfigurable Architecture

Automatic Nested Loop Acceleration on FPGAs Using Soft CGRA Overlay