A High-Performance Accelerator for Large-Scale Convolutional Neural Networks

Fan Sun,Chao Wang,Lei Gong,Chongchong Xu,Yiwei Zhang,Yuntao Lu,Xi Li,Xuehai Zhou
DOI: https://doi.org/10.1109/ispa/iucc.2017.00099
2017-01-01
Abstract:Convolutional neural networks(CNNs) have been widely applied in various applications because of their ability to achieve accuracy close to or even better than human level perception. However, for large-scale CNN, the computation-intensive convolutional layers and memory-intensive fully connected layers have brought many challenges to the implementation of CNN on FPGA platform. In the existing implementations, the same parallelism strategy is used for the entire CNN model, such a "one size fits all" approach may result in resource utilization problem. To overcome this problem, this work proposes an FPGA-based accelerator, which consists of multiple processing elements(PEs), each is responsible for the computation of one layer in the network model. All the PEs are mapped on one chip so that different layers can work concurrently in a pipelined style. A methodology is proposed to maximize the throughput of the accelerator. In the fully connected layers, a pruning method is used to decrease the number of weights, which can save a lot of storage and computation. Moreover, a batch-based computing method is applied to the compressed data in order to reduce the required memory bandwidth. As a case study, we implement a large-scale CNN model, AlexNet, on VC707 Board which has a Xilinx FPGA chip Virtex-7 485T. The proposed accelerator can achieve a peak performance of 498.6 GOP/s and the power efficiency with the value of 21.3 GOP/s/W under 100MHz clock frequency, which outperforms previous approaches.
What problem does this paper attempt to address?