Hardware Implementation on FPGA for Task-Level Parallel Dataflow Execution Engine.
Chao Wang,Junneng Zhang,Xi Li,Aili Wang,Xuehai Zhou
DOI: https://doi.org/10.1109/tpds.2015.2487346
IF: 5.3
2015-01-01
IEEE Transactions on Parallel and Distributed Systems
Abstract:Heterogeneous multicore platform has been widely used in various areas to achieve both power efficiency and high performance. However, it poses significant challenges to researchers to uncover more coarse-grained task level parallelization. In order to support automatic task parallel execution, this paper proposes a FPGA implementation of a hardware out-of-order scheduler on heterogeneous multicore platform. The scheduler is capable of exploring potential inter-task dependency, leading to a significant acceleration of dependence-aware applications. With the help of renaming scheme, the task dependencies are detected automatically during execution, and then task-level Write-After-Write (WAW) andWrite-After-Read (WAR) dependencies can be eliminated dynamically. We extended the instruction level renaming techniques to perform task-level out-of-order execution, and implemented a prototype on a state-of-art Xilinx Virtex-5 FPGA device. Given the reconfigurable characteristic of FPGA, our scheduler supports changing accelerators at runtime to improve the flexibility. Experimental results demonstrate that our scheduler is efficient at both performance and resources usage.