Anagha Nimbekar,Prabodh Katti,Chen Li,Bashir M. Al-Hashimi,Amit Acharyya,Bipin Rajendran
Abstract:Spiking Neural Networks (SNNs) have emerged as a promising approach to improve the energy efficiency of machine learning models, as they naturally implement event-driven computations while avoiding expensive multiplication operations. In this paper, we develop a hardware-software co-optimisation strategy to port software-trained deep neural networks (DNN) to reduced-precision spiking models demonstrating fast and accurate inference in a novel event-driven CMOS reconfigurable spiking inference accelerator. Experimental results show that a reduced-precision Resnet-18 and VGG-11 SNN models achieves classification accuracy within 1% of the baseline full-precision DNN model within 8 spike timesteps. We also demonstrate an FPGA prototype implementation of the spiking inference accelerator with a throughput of 38.4 giga operations per second (GOPS) consuming 1.54 Watts on PYNQ-Z2 FPGA. This corresponds to 0.6 GOPS per processing element and 2.25,GOPS/DSP slice, which is 2x and 4.5x higher utilisation efficiency respectively compared to the state-of-the-art. Our co-optimisation strategy can be employed to develop deep reduced precision SNN models and port them to resource-efficient event-driven hardware accelerators for edge applications.
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to improve the energy efficiency and inference speed of machine - learning models based on spiking neural networks (SNNs) on hardware, while maintaining classification accuracy comparable to that of traditional artificial neural networks (ANNs). Specifically, the researchers developed a software - hardware co - optimization strategy to convert software - trained deep neural networks (DNNs) into low - precision spiking models and achieve fast and accurate inference on a new event - driven CMOS reconfigurable spiking inference accelerator.
### Main contributions include:
1. **High - precision and low - precision SNN optimization**: The study shows how to convert a software - trained full - precision ResNet - 18 deep network into a low - precision SNN to adapt to hardware implementation, while maintaining high classification accuracy and completing inference within 8 time steps.
2. **Area - and power - efficient processing unit**: A new type of processing unit (PE) is designed, which contains 3 8 - bit multiplexers and an 8 - bit adder, suitable for implementing 3×3 convolution kernels, avoiding the use of expensive multipliers. In addition, the ability of this PE to support other convolution kernel sizes and fully - connected layers is also verified.
3. **Efficient hardware architecture**: An efficient and reconfigurable low - latency hardware accelerator for spiking neural networks is proposed, which solves multiple limitations in existing research, such as the low accuracy of SNNs and the need for more than 50 time steps to achieve performance approximate to that of DNNs.
### Experimental results:
- **ResNet - 18**: On the CIFAR - 10 dataset, the accuracy of the baseline FP32 ANN is 95.83%, the accuracy after quantization is 94.37%, and after using the IF spiking activation model, the accuracy of the SNN reaches 94.71%, exceeding the accuracy of the quantized ANN within 8 time steps.
- **VGG - 11**: The accuracy of the baseline ANN is 91.25%, the accuracy after quantization is 90.05%, and after using the IF activation model, the accuracy of the SNN reaches 90.47%.
### Hardware implementation:
- FPGA implementation is carried out using the Xilinx PYNQ - Z2 board, achieving a throughput of 38.4 GOPS with a power consumption of 1.54 watts.
- The efficiency of each processing unit is 0.6 GOPS, and the efficiency of each DSP slice is 2.25 GOPS, which are 2 times and 4.5 times that of the existing technology respectively.
### Conclusion:
Through the software - hardware co - optimization method, this paper successfully improves the energy efficiency and inference speed of SNNs on hardware, while maintaining classification accuracy comparable to that of traditional ANNs. This result provides new ideas for the design of resource - efficient event - driven hardware accelerators in edge applications.