A Reschedulable Dataflow-SIMD Execution for Increased Utilization in CGRA Cross-Domain Acceleration

Chen Yin,Naifeng Jing,Jianfei Jiang,Qin Wang,Zhigang Mao
DOI: https://doi.org/10.1109/tcad.2022.3185544
IF: 2.9
2023-01-01
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Abstract:When a coarse-grained reconfigurable array (CGRA) architecture shifts toward cross-domain acceleration, control flow and memory accesses often degrade the processing elements (PEs) utilization and array efficiency by breaking the intact dataflow graph (DFG) into regions with mismatched pipelining rate and access–execution stages. In this article, we propose a reschedulable dataflow and SIMD execution, which decouples the DFG with mismatched dataflow into multiple independent subgraphs. We map only one subgraph at a time but with fully unrolling, and reschedule different subgraphs serially in the runtime. Therefore, each subgraph works in its own way without interfering with others. At the same time, an individual subgraph can execute its dataflow in stream for utilization improvement, while unrolled instances composing as SIMD facilitate request coalescing for efficient memory access. With lightweight hardware modification, our design can be integrated in a general CGRA architecture. The experimental results show that our proposal improves the performance and energy efficiency over stream-dataflow CGRA in static-scheduling (Plasticine) by $1.6\times $ and $1.8\times $ , over which in dynamic scheduling (TIA) by $1.5\times $ and $2.7\times $ , and outperforms Plasticine organized in vector-SIMD by $1.2\times $ and $1.4\times $ .
What problem does this paper attempt to address?