Optimization for Efficient Hardware Implementation of CNN on FPGA

Fasih Ud Din Farrukh,Tuo Xie,Chun Zhang,Zhihua Wang
DOI: https://doi.org/10.1109/cicta.2018.8706067
2018-11-01
Abstract:Deep neural networks (DNN) have been a hot research topic in recent years. The key element of DNN is to explore the real time hardware implementation. However, it requires a complete knowledge of hardware where the DNN is going to be implemented. The computational complexity and resource consumption of DNN is increasing by the time. Convolutional Neural Network (CNN) is the popular architecture of DNN especially for image classification. One requires an efficient implementation strategy of CNN to incorporate more computations in real time. Field Programmable Gate Array (FPGA) is considered to be the energy efficient choice for CNN as compared to Graphical Processing Units (GPUs). In this paper, new idea is explored and implemented for basic Processing Element (PE) of CNN. FPGA has limited built-in multiplier accumulator (MAC) units. In this work, MAC units are replaced by Wallace Tree based Multiplier which belongs to the family of log time array multipliers. The resources are saved in terms of MAC units and we can implement more processing elements on FPGA.
What problem does this paper attempt to address?