Abstract:Spiking neural networks (SNNs) have shown advantages in computation and energy efficiency over traditional artificial neural networks (ANNs) thanks to their event-driven representations. SNNs also replace weight multiplications in ANNs with additions, which are more energy-efficient and less computationally intensive. However, it remains a challenge to train deep SNNs due to the discrete spike function. A popular approach to circumvent this challenge is ANN-to-SNN conversion. However, due to the quantization error and accumulating error, it often requires lots of time steps (high inference latency) to achieve high performance, which negates SNN's advantages. To this end, this paper proposes Fast-SNN that achieves high performance with low latency. We demonstrate the equivalent mapping between temporal quantization in SNNs and spatial quantization in ANNs, based on which the minimization of the quantization error is transferred to quantized ANN training. With the minimization of the quantization error, we show that the sequential error is the primary cause of the accumulating error, which is addressed by introducing a signed IF neuron model and a layer-wise fine-tuning mechanism. Our method achieves state-of-the-art performance and low latency on various computer vision tasks, including image classification, object detection, and semantic segmentation. Codes are available at: <a class="link-external link-https" href="https://github.com/yangfan-hu/Fast-SNN" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to improve the performance of spiking neural networks (SNNs) while reducing the inference latency. Specifically, the paper proposes a method called Fast - SNN, aiming to achieve high performance and low latency by reducing quantization error and accumulated error. The following is a detailed explanation of the problems that the paper attempts to solve: 1. **Quantization Error**: - Information processing in SNNs is based on time - discrete events (i.e., spikes), which leads to quantization error compared with the continuous activation values in traditional artificial neural networks (ANNs). - By showing the equivalent mapping between time quantization in SNNs and space quantization in ANNs, the paper transfers the minimization of quantization error to the quantization ANN training process. This includes finding the optimal clipping range and new distributions of weights and activations for each layer through supervised training. 2. **Accumulated Error**: - During the process of converting the quantized ANN into an SNN, the quantization error of each layer will accumulate, resulting in a decline in the performance of deep networks. - The paper reduces the accumulated error by introducing the signed IF neuron model and the layer - by - layer fine - tuning mechanism. The signed IF neuron model can offset the misfired spikes, and the layer - by - layer fine - tuning mechanism alleviates the cross - layer accumulated error by minimizing the difference between the spike rate of the SNN and the activation of the ANN. 3. **Low Latency**: - Traditional SNN conversion methods usually require a large number of time steps (high inference latency) to achieve high performance, which offsets the advantages of SNNs in terms of computation and energy consumption. - The method proposed in the paper can achieve performance comparable to that of ANNs within a shorter number of time steps (e.g., 3, 7, 15 time steps), thus retaining the low - latency advantage of SNNs. ### Specific Contributions - **Quantization Error Minimization**: Demonstrated the equivalent mapping between time quantization in SNNs and space quantization in ANNs, and minimized the quantization error by supervised training of the quantized ANN, finding the optimal clipping range and new distributions of weights and activations for each layer. - **Sequential Error Minimization**: Reduced the accumulated error by introducing the signed IF neuron model and the layer - by - layer fine - tuning mechanism, especially improving the speed of SNNs and reducing the inference latency by dealing with the sequential error of each layer. - **Multiple Computer Vision Tasks**: This method performs well on various computer vision tasks, including image classification, object detection, and semantic segmentation, achieving state - of - the - art performance and low latency. ### Related Work - **Direct Training Methods**: Directly train SNNs using surrogate gradients, but these methods have challenges in terms of computational and memory efficiency, especially in deep networks. - **ANN - to - SNN Conversion**: Existing conversion methods mainly optimize the clipping range through statistical analysis, but fail to optimize the distributions of weights and activations, resulting in poor performance at low latency. ### Summary The paper proposes a new method, Fast - SNN, which realizes high - performance and low - latency SNNs by minimizing quantization error and accumulated error. This method not only theoretically solves the key problems in SNN conversion but also shows significant advantages in practical applications.

Fast-SNN: Fast Spiking Neural Network by Converting Quantized ANN

Quantization Framework for Fast Spiking Neural Networks

Optimal ANN-SNN Conversion for High-accuracy and Ultra-low-latency Spiking Neural Networks

High-accuracy deep ANN-to-SNN conversion using quantization-aware training framework and calcium-gated bipolar leaky integrate and fire neuron

Optimal ANN-SNN Conversion for Fast and Accurate Inference in Deep Spiking Neural Networks

Spiking Deep Residual Networks.

Spike Trains Encoding and Threshold Rescaling Method for Deep Spiking Neural Networks

Faster and Stronger: When ANN-SNN Conversion Meets Parallel Spiking Calculation

A universal ANN-to-SNN framework for achieving high accuracy and low latency deep Spiking Neural Networks

An all integer-based spiking neural network with dynamic threshold adaptation

Highly Efficient SNNs for High-speed Object Detection

Toward High-Accuracy and Low-Latency Spiking Neural Networks With Two-Stage Optimization

A Novel Conversion Method for Spiking Neural Network Using Median Quantization

CSNN: an Augmented Spiking Based Framework with Perceptron-Inception

Training-free Conversion of Pretrained ANNs to SNNs for Low-Power and High-Performance Applications

Efficient Converted Spiking Neural Network for 3D and 2D Classification

Spike Calibration: Fast and Accurate Conversion of Spiking Neural Network for Object Detection and Segmentation

SNN2ANN: A Fast and Memory-Efficient Training Framework for Spiking Neural Networks

Low Latency Spiking ConvNets with Restricted Output Training and False Spike Inhibition

Quantisation and Pooling Method for Low-Inference-latency Spiking Neural Networks

FastSNN: A CUDA-Based Programming Framework for Rapid Training SNNs