A Layer-Based Structured Design of CNN on FPGA

Chao Huang,Siyu Ni,Gengsheng Chen
DOI: https://doi.org/10.1109/asicon.2017.8252656
2017-01-01
Abstract:Convolutional neural networks (CNNs) are widely used in machine learning applications. Large in scale, most deep CNNs are however difficult to be implemented on a single hardware for acceleration. This paper presents a new design and implementation of a 23-layer SqueezeNet [1] on a Xilinx VC709 FPGA board. In this new design, a novel layer-based structured design method is proposed for full scalability in constructing CNNs, in which all the CNN layers are optimized and deployed separately and independently. Moreover, inherent parallelism in CNN's data channels and intra-kernel computations, together with the data structure in memory, are exploited and optimized for performance and efficiency enhancement. This new design and its architecture enables the whole CNN to have a flexible and scalable deployment, with all its layers working concurrently in a pipelined structure. Experimental result shows that, the newly implemented 23-layer SqueezeNet can reach its peak performance of 213.7GOP/s under 110MHz clock frequency with 79.05% top-5 accuracy, which is much faster and more efficient than similar works. Additionally, with the same CNN network built on, FPGA shows a much better performance than CPU, GPU and SoC in terms of power efficiency.
What problem does this paper attempt to address?