Configurable CNN Accelerator Based on Tiling Dataflow

Yihuang Li,Sheng Ma,Yang Guo,Rui Xu,Guilin Chen
DOI: https://doi.org/10.1109/icsess.2018.8663795
2018-01-01
Abstract:Convolutional neural networks are widely used in deep learning. Nowadays, many CNN accelerators have been designed because the specific accelerators have high energy efficiency. Especially, the Tiling dataflow accelerators have achieved a high performance. However, we find that the size of the hardware is larger and larger. And the utilization of the Tiling dataflow may be low with the number of the process elements (PEs)increasing. In order to achieve a high-utilization accelerator, we propose a Configurable CNN Accelerator based on the Tiling dataflow. As we all know, the convolution can be seen as a 6-layer unrolling loops. But the Tiling dataflow only exploit 2-layer unrolling loops. The high-performance accelerator needs develop more parallelism to repress the PEs idling. Through the configurable technology, the Configurable CNN Accelerator can exploit 4-layer unrolling loops. And the unrolling loop strategy is optional. On this foundation, we propose a partial configurable technology, which can not only improve the utilization when the number of the PEs is giant, but also reduce the hardware overhead as much as possible. At last, we apply the Configurable CNN Accelerator to test several mainstream CNNs. The accelerator can achieve 1.2-30x speedup compared with the 16*16 Tiling dataflow. And when the number of the PE exceeds 512, the utilization can be maintained at an average of 82%-90%.
What problem does this paper attempt to address?