Optimization of Convolution Neural Network Algorithm Based on FPGA

Feixue Tang,Weichao Zhang,Xiaogang Tian,Xiaoye Fan,Xixin Cao
DOI: https://doi.org/10.1007/978-981-13-1026-3_10
2018-01-01
Abstract:The traditional CNN algorithm requires a great deal of computation and is difficult to be optimized. The computation of throughput on the hardware platform does not match the memory bandwidth very well. The existing scheme doesn't take full advantage of logical resources, and also doesn't make full use of memory bandwidth. Neither of them can get the best performance. In this paper, we use the commonly used im2col method in the software implementation and convert convolution operation into matrix multiplication. Therefore, it improves the calculation speed effectively. In the hardware implementation aspect, we propose a nested loop optimization structure. Firstly, the correlation of the parameters is analyzed, multiplication times are reduced and the multiplication of the inner loop is replaced by an addition operation. Hence, the maximum operating frequency and power consumption are improved remarkably. Secondly, the input data and the convolution kernel are multi-level partitioning optimization. The multi-layer input data is grouped by 2k, and the data of each layer is optimized by L group. At the same time, the convolution kernel is also grouped by 2k and the convolution kernel with the parallel data synchronization operation optimization. So the structure has a significant improvement in the degree of parallelism. The external bandwidth and the internal bandwidth can be improved significantly in the condition of the same total computation.
What problem does this paper attempt to address?