Abstract:Spiking neural networks (SNNs) have shown advantages in computation and energy efficiency over traditional artificial neural networks (ANNs) thanks to their event-driven representations. SNNs also replace weight multiplications in ANNs with additions, which are more energy-efficient and less computationally intensive. However, it remains a challenge to train deep SNNs due to the discrete spike function. A popular approach to circumvent this challenge is ANN-to-SNN conversion. However, due to the quantization error and accumulating error, it often requires lots of time steps (high inference latency) to achieve high performance, which negates SNN's advantages. To this end, this paper proposes Fast-SNN that achieves high performance with low latency. We demonstrate the equivalent mapping between temporal quantization in SNNs and spatial quantization in ANNs, based on which the minimization of the quantization error is transferred to quantized ANN training. With the minimization of the quantization error, we show that the sequential error is the primary cause of the accumulating error, which is addressed by introducing a signed IF neuron model and a layer-wise fine-tuning mechanism. Our method achieves state-of-the-art performance and low latency on various computer vision tasks, including image classification, object detection, and semantic segmentation. Codes are available at: <a class="link-external link-https" href="https://github.com/yangfan-hu/Fast-SNN" rel="external noopener nofollow">this https URL</a>.
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to improve the performance of spiking neural networks (SNNs) while reducing the inference latency. Specifically, the paper proposes a method called Fast - SNN, aiming to achieve high performance and low latency by reducing quantization error and accumulated error. The following is a detailed explanation of the problems that the paper attempts to solve:
1. **Quantization Error**:
- Information processing in SNNs is based on time - discrete events (i.e., spikes), which leads to quantization error compared with the continuous activation values in traditional artificial neural networks (ANNs).
- By showing the equivalent mapping between time quantization in SNNs and space quantization in ANNs, the paper transfers the minimization of quantization error to the quantization ANN training process. This includes finding the optimal clipping range and new distributions of weights and activations for each layer through supervised training.
2. **Accumulated Error**:
- During the process of converting the quantized ANN into an SNN, the quantization error of each layer will accumulate, resulting in a decline in the performance of deep networks.
- The paper reduces the accumulated error by introducing the signed IF neuron model and the layer - by - layer fine - tuning mechanism. The signed IF neuron model can offset the misfired spikes, and the layer - by - layer fine - tuning mechanism alleviates the cross - layer accumulated error by minimizing the difference between the spike rate of the SNN and the activation of the ANN.
3. **Low Latency**:
- Traditional SNN conversion methods usually require a large number of time steps (high inference latency) to achieve high performance, which offsets the advantages of SNNs in terms of computation and energy consumption.
- The method proposed in the paper can achieve performance comparable to that of ANNs within a shorter number of time steps (e.g., 3, 7, 15 time steps), thus retaining the low - latency advantage of SNNs.
### Specific Contributions
- **Quantization Error Minimization**: Demonstrated the equivalent mapping between time quantization in SNNs and space quantization in ANNs, and minimized the quantization error by supervised training of the quantized ANN, finding the optimal clipping range and new distributions of weights and activations for each layer.
- **Sequential Error Minimization**: Reduced the accumulated error by introducing the signed IF neuron model and the layer - by - layer fine - tuning mechanism, especially improving the speed of SNNs and reducing the inference latency by dealing with the sequential error of each layer.
- **Multiple Computer Vision Tasks**: This method performs well on various computer vision tasks, including image classification, object detection, and semantic segmentation, achieving state - of - the - art performance and low latency.
### Related Work
- **Direct Training Methods**: Directly train SNNs using surrogate gradients, but these methods have challenges in terms of computational and memory efficiency, especially in deep networks.
- **ANN - to - SNN Conversion**: Existing conversion methods mainly optimize the clipping range through statistical analysis, but fail to optimize the distributions of weights and activations, resulting in poor performance at low latency.
### Summary
The paper proposes a new method, Fast - SNN, which realizes high - performance and low - latency SNNs by minimizing quantization error and accumulated error. This method not only theoretically solves the key problems in SNN conversion but also shows significant advantages in practical applications.