Energy-Efficient Cnn Implementation on A Deeply Pipelined Fpga Cluster

Chen Zhang,Di Wu,Jiayu Sun,Guangyu Sun,Guojie Luo,Jason Cong
DOI: https://doi.org/10.1145/2934583.2934644
2016-01-01
Abstract:Recently, FPGA-based CNN accelerators have demonstrated superior energy efficiency compared to high-performance devices like GPGPUs. However, due to the constrained on-chip resource and many other factors, single-board FPGA designs may have difficulties in achieving optimal energy efficiency. In this paper we present a deeply pipelined multi-FPGA architecture that expands the design space for optimal performance and energy efficiency. A dynamic programming algorithm is proposed to map the CNN computing layers efficiently to different FPGA boards. To demonstrate the potential of the architecture, we built a prototype system with seven FPGA boards connected with high-speed serial links. The experimental results on AlexNet and VGG-16 show that the prototype can achieve up to 21x and 2x energy efficiency compared to optimized multi-core CPU and GPU implementations, respectively.
What problem does this paper attempt to address?