A high throughput acceleration for hybrid neural networks with efficient resource management on FPGA

Shouyi Yin,Shibin Tang,Xinhan Lin,Peng Ouyang,Fengbin Tu,Leibo Liu,Shaojun Wei
DOI: https://doi.org/10.1109/tcad.2018.2821561
IF: 2.9
2019-01-01
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Abstract:Deep learning is the amazing technology which has promoted the development of artificial intelligence and achieved many amazing successes in intelligent fields. Convolution-based layers (CLs), fully connected layers (FLs) and recurrent layers (RLs) are three types of layers in classic neural networks. Most intelligent tasks are implemented by the hybrid neural networks (hybrid-NNs), which are commonly composed of different layer-blocks (LBs) of CLs, FLs, and RLs. Because the CLs require the most computation in hybrid-NNs, many field-programmable gate array (FPGA)-based accelerators focus on CLs acceleration and have demonstrated great performance. However, the CLs accelerators lead to an underutilization of FPGA resources in the acceleration of the whole hybrid-NN. To fully exploit the logic resources and the memory bandwidth in the acceleration of CLs/FLs/RLs, we propose an FPGA resource efficient mapping mechanism for hybrid-NNs. The mechanism first improves the utilization of DSPs by integrating multiple small bit-width operations on one DSP. Then the LB-level spatial mapping is used to exploit the complementary features between different neural networks in the hybrid-NN. We evaluate the mapping mechanism by implementing four hybrid-NNs on Xilinx Virtex7 690T FPGA. The proposed mechanism achieves a peak performance of 1805.8 giga operations per second (GOPs). With the analysis on resource utilization and throughput, the proposed method exploits more computing power in FPGA and achieves up to <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$4.13 \times$ </tex-math></inline-formula> higher throughput than the state-of-the-art acceleration.
What problem does this paper attempt to address?