Abstract:Spiking Neural Networks (SNNs) have emerged as a promising approach to improve the energy efficiency of machine learning models, as they naturally implement event-driven computations while avoiding expensive multiplication operations. In this paper, we develop a hardware-software co-optimisation strategy to port software-trained deep neural networks (DNN) to reduced-precision spiking models demonstrating fast and accurate inference in a novel event-driven CMOS reconfigurable spiking inference accelerator. Experimental results show that a reduced-precision Resnet-18 and VGG-11 SNN models achieves classification accuracy within 1% of the baseline full-precision DNN model within 8 spike timesteps. We also demonstrate an FPGA prototype implementation of the spiking inference accelerator with a throughput of 38.4 giga operations per second (GOPS) consuming 1.54 Watts on PYNQ-Z2 FPGA. This corresponds to 0.6 GOPS per processing element and 2.25,GOPS/DSP slice, which is 2x and 4.5x higher utilisation efficiency respectively compared to the state-of-the-art. Our co-optimisation strategy can be employed to develop deep reduced precision SNN models and port them to resource-efficient event-driven hardware accelerators for edge applications.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to improve the energy efficiency and inference speed of machine - learning models based on spiking neural networks (SNNs) on hardware, while maintaining classification accuracy comparable to that of traditional artificial neural networks (ANNs). Specifically, the researchers developed a software - hardware co - optimization strategy to convert software - trained deep neural networks (DNNs) into low - precision spiking models and achieve fast and accurate inference on a new event - driven CMOS reconfigurable spiking inference accelerator. ### Main contributions include: 1. **High - precision and low - precision SNN optimization**: The study shows how to convert a software - trained full - precision ResNet - 18 deep network into a low - precision SNN to adapt to hardware implementation, while maintaining high classification accuracy and completing inference within 8 time steps. 2. **Area - and power - efficient processing unit**: A new type of processing unit (PE) is designed, which contains 3 8 - bit multiplexers and an 8 - bit adder, suitable for implementing 3×3 convolution kernels, avoiding the use of expensive multipliers. In addition, the ability of this PE to support other convolution kernel sizes and fully - connected layers is also verified. 3. **Efficient hardware architecture**: An efficient and reconfigurable low - latency hardware accelerator for spiking neural networks is proposed, which solves multiple limitations in existing research, such as the low accuracy of SNNs and the need for more than 50 time steps to achieve performance approximate to that of DNNs. ### Experimental results: - **ResNet - 18**: On the CIFAR - 10 dataset, the accuracy of the baseline FP32 ANN is 95.83%, the accuracy after quantization is 94.37%, and after using the IF spiking activation model, the accuracy of the SNN reaches 94.71%, exceeding the accuracy of the quantized ANN within 8 time steps. - **VGG - 11**: The accuracy of the baseline ANN is 91.25%, the accuracy after quantization is 90.05%, and after using the IF activation model, the accuracy of the SNN reaches 90.47%. ### Hardware implementation: - FPGA implementation is carried out using the Xilinx PYNQ - Z2 board, achieving a throughput of 38.4 GOPS with a power consumption of 1.54 watts. - The efficiency of each processing unit is 0.6 GOPS, and the efficiency of each DSP slice is 2.25 GOPS, which are 2 times and 4.5 times that of the existing technology respectively. ### Conclusion: Through the software - hardware co - optimization method, this paper successfully improves the energy efficiency and inference speed of SNNs on hardware, while maintaining classification accuracy comparable to that of traditional ANNs. This result provides new ideas for the design of resource - efficient event - driven hardware accelerators in edge applications.

Hardware-Software Co-optimised Fast and Accurate Deep Reconfigurable Spiking Inference Accelerator Architecture Design Methodology

Spike Trains Encoding Optimization for Spiking Neural Networks Implementation in FPGA

A Reconfigurable FPGA-based Spiking Neural Network Accelerator

Enabling Efficient On-Edge Spiking Neural Network Acceleration with Highly Flexible FPGA Architectures

A Sparsity-Adapted Hardware Implementation of SNN for Cortical Spike Trains Decoding

Boosting Throughput and Efficiency of Hardware Spiking Neural Accelerators using Time Compression Supporting Multiple Spike Codes

A Low Power and Low Latency FPGA-Based Spiking Neural Network Accelerator

Bayesian Inference Accelerator for Spiking Neural Networks

Energy efficient spiking neural network processing using approximate arithmetic units and variable precision weights

An FPGA Implementation of Deep Spiking Neural Networks for Low-Power and Fast Classification

SpiDR: A Reconfigurable Digital Compute-in-Memory Spiking Neural Network Accelerator for Event-based Perception

Hardware implementation of spiking neural networks on FPGA

A Cost-Efficient High-Speed VLSI Architecture for Spiking Convolutional Neural Network Inference Using Time-Step Binary Spike Maps

A Fast Spiking Neural Network Accelerator based on BP-STDP Algorithm and Weighted Neuron Model

Hardware/Software co-design with ADC-Less In-memory Computing Hardware for Spiking Neural Networks

An Event-driven Spiking Neural Network Accelerator with On-chip Sparse Weight

An Efficient Spiking Neural Network Accelerator with Sparse Weight.

FireFly: A High-Throughput Hardware Accelerator for Spiking Neural Networks with Efficient DSP and Memory Optimization

ESSA: Design of a Programmable Efficient Sparse Spiking Neural Network Accelerator

SPAT: FPGA-based Sparsity-Optimized Spiking Neural Network Training Accelerator with Temporal Parallel Dataflow

A 0.67-to-5.4 TSOPs/W Spiking Neural Network Accelerator with 128/256 Reconfigurable Neurons and Asynchronous Fully Connected Synapses