A Scalable Hybrid Architecture for High Performance Data-Parallel Applications

Moucheng Yang,Jifang Jin,Zhehao Li,Xuegong Zhou,Shaojun Wang,Lingli Wang
DOI: https://doi.org/10.1109/fpt.2017.8280138
2017-01-01
Abstract:This paper presents a scalable hybrid architecture for high performance data-parallel applications on tightly coupled shared-memory CPU-FPGA systems such as the Xilinx Zynq SoC. The aims of the proposed architecture are: 1) to simplify the development of hardware acceleration for data-parallel applications; 2) to reach the performance limit caused by memory access and/or hardware resource available on an FPGA; 3) to reduce the overhead caused by task scheduling and device drivers. The proposed architecture can be used as a generic template to implement data-parallel applications. Each task in an application is mapped to one hardware accelerator, which is called "kernel". Several identical instances of each hardware kernel execute concurrently to provide parallelism. By deploying the maximum number of instances of the hardware kernel, we make full use of the bandwidth of memory access and the resources available on the FPGA. In order to improve performance further, task scheduling and device drivers are implemented as a hardware scheduler called DmaScheduler on FPGA hardware. Experimental results show 2.93x-51.25x speedup on Zynq FPGA for applications of image processing, Black Scholes option pricing, matrix multiplication and clustering algorithm, compared with existing FPGA implementations.
What problem does this paper attempt to address?