End-to-end Scalable FPGA Accelerator for Deep Residual Networks.

Yufei Ma,Minkyu Kim,Yu Cao,Sarma Vrudhula,Jae-sun Seo
DOI: https://doi.org/10.1109/iscas.2017.8050344
2017-01-01
Abstract:This work presents an efficient hardware accelerator design of deep residual learning algorithms, which have shown superior image recognition accuracy (>90% top-5 accuracy on ImageNet database). Two key objectives of the acceleration strategy are to (1) maximize resource utilization and minimize data movements, and (2) employ scalable and reusable computing primitives to optimize physical design under hardware constraints. Furthermore, we present techniques for efficient integration and communication of these primitives in deep residual convolutional neural networks (CNNs) that exhibit complex, non-uniform layer connections. The proposed hardware accelerator efficiently implements state-of-the-art ResNet-50/152 algorithms on Arria-10 FPGA, demonstrating 285.1/315.5 GOPS of throughput and 27.2/71.7 ms of latency, respectively.
What problem does this paper attempt to address?