FireFly v2: Advancing Hardware Support for High-Performance Spiking Neural Network with a Spatiotemporal FPGA Accelerator

Jindong Li,Guobin Shen,Dongcheng Zhao,Qian Zhang,Yi Zeng
DOI: https://doi.org/10.48550/arXiv.2309.16158
2023-09-28
Abstract:Spiking Neural Networks (SNNs) are expected to be a promising alternative to Artificial Neural Networks (ANNs) due to their strong biological interpretability and high energy efficiency. Specialized SNN hardware offers clear advantages over general-purpose devices in terms of power and performance. However, there's still room to advance hardware support for state-of-the-art (SOTA) SNN algorithms and improve computation and memory efficiency. As a further step in supporting high-performance SNNs on specialized hardware, we introduce FireFly v2, an FPGA SNN accelerator that can address the issue of non-spike operation in current SOTA SNN algorithms, which presents an obstacle in the end-to-end deployment onto existing SNN hardware. To more effectively align with the SNN characteristics, we design a spatiotemporal dataflow that allows four dimensions of parallelism and eliminates the need for membrane potential storage, enabling on-the-fly spike processing and spike generation. To further improve hardware acceleration performance, we develop a high-performance spike computing engine as a backend based on a systolic array operating at 500-600MHz. To the best of our knowledge, FireFly v2 achieves the highest clock frequency among all FPGA-based implementations. Furthermore, it stands as the first SNN accelerator capable of supporting non-spike operations, which are commonly used in advanced SNN algorithms. FireFly v2 has doubled the throughput and DSP efficiency when compared to our previous version of FireFly and it exhibits 1.33 times the DSP efficiency and 1.42 times the power efficiency compared to the current most advanced FPGA accelerators.
Neural and Evolutionary Computing,Hardware Architecture
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the insufficient ability of existing dedicated SNN (Spiking Neural Network) hardware to support the latest SNN algorithms, especially when dealing with non - spike operations. Specifically, the current SNN hardware designs cannot effectively support the following situations: 1. **Pixel operations in the direct encoding layer**: In direct input encoding, the initial convolutional layer uses analog pixel values, which are incompatible with the existing spike - based SNN hardware. 2. **Multi - bit spike operations in SEW - ResNet**: SEW - ResNet introduces non - spike operations through spike element - level summation, which appears as non - spike convolution in the next convolutional layer. 3. **Fractional - spike convolution introduced by the average pooling layer**: The average pooling function commonly used in SNN models introduces fractional - spike convolution, and the existing SNN hardware cannot support this operation. To address these challenges, the paper proposes FireFly v2, an FPGA - based SNN accelerator with the following features: - **Support for non - spike operations**: FireFly v2 can support direct encoding, spike element - level residual connections, and common average pooling operations. - **Support for multiple neural dynamics**: It supports different types of neuron models, such as IF (Integrate - and - Fire), LIF (Leaky Integrate - and - Fire), and RMP (Resonate - and - Fire). - **Arbitrary convolution configurations**: It supports different convolution kernel sizes, strides, and padding configurations. - **Spatio - temporal data flow**: It adopts a four - dimensional parallel data flow scheme, including input channel parallelism, output channel parallelism, pixel - level parallelism, and time - step parallelism. - **High - performance spike - computing engine**: It integrates a high - performance pipelined - array - based quantum - computing engine with an operating frequency of 500 - 600MHz. Through these improvements, FireFly v2 can not only better support the latest SNN algorithms but also significantly improve hardware performance, including throughput and DSP (Digital Signal Processor) efficiency.