Pflow: An end-to-end heterogeneous acceleration framework for CNN inference on FPGAs

Yi Wan,Xianzhong Xie,Lingjie Yi,Bo Jiang,Junfan Chen,Yi Jiang
DOI: https://doi.org/10.1016/j.sysarc.2024.103113
IF: 5.836
2024-03-17
Journal of Systems Architecture
Abstract:Field-Programmable Gate Arrays (FPGAs), renowned for their high performance per watt, are extensively utilized to accelerate Convolutional Neural Networks (CNNs) in edge computing environments, primarily employing dataflow-based and instruction set-based approaches. Compared to the instruction set-based approach that features fast and versatile circuit design, the dataflow-based approach can significantly enhance performance at the expense of design versatility. Nevertheless, edge computing environments require both high energy efficiency and adaptability to various scenarios. This paper proposes a novel end-to-end heterogeneous acceleration framework for CNN inference on FPGAs, named Pflow. The basic idea is to decouple network deployment and hardware details with a hardware-software co-design approach. First, a dataflow accelerator with an adaptive scheduling strategy is proposed. The adaptive scheduling strategy, along with a scalable design, maximizes hardware utilization in terms of computing resources and bandwidth. Secondly, we design a novel operator-perception method to automate the processes of network reconstruction and operator fusion. Thirdly, we integrate Pflow into the industrial-grade deep learning framework Paddle-Lite. We evaluate Pflow by implementing several networks on two representative FPGA platforms. Experimental results demonstrate that Pflow achieves energy efficiencies of 46.5 GOPS/W on Xilinx Zynq Ultrascale+ MPSoC 3EG and 59.4 GOPS/W on Virtex UltraScale+ XCVU13P. It also reaches a throughput of up to 255.7 GOPS on the former and 3.686 TOPS on the latter.
computer science, software engineering, hardware & architecture
What problem does this paper attempt to address?