Smilodon: an Efficient Accelerator for Low Bit-Width CNNs with Task Partitioning

Qinyu Chen,Yuxiang Fu,Kaifeng Cheng,Wenqing Song,Zhonghai Lu,Li,Chuan Zhang
DOI: https://doi.org/10.1109/iscas.2019.8702547
2019-01-01
Abstract:Convolutional Neural Networks (CNNs) have been widely applied in various fields such as image and video recognition, recommender systems, and natural language processing. However, the massive size and intensive computation loads prevent its feasible deployment in practice, especially on the embedded systems. As a highly competitive candidate, low bit-width CNNs are proposed to enable efficient implementation. In this paper, we propose Smilodon, a scalable, efficient accelerator for low bit-width CNNs based on a parallel streaming architecture, optimized with a task partitioning strategy. We also present the 3D systolic-like computing arrays fitting for convolutional layers. Our design is implemented on Zynq XC7Z020 FPGA, which can satisfy the needs of real-time with a frame rate of 1, 622 FPS throughput, while consuming 2 1 Watt. To the best of our knowledge, our accelerator is superior to the state-of-the-art works in the tradeoff among throughput, power efficiency, and area efficiency.
What problem does this paper attempt to address?