Abstract:Spiking neural networks (SNNs) are the third generation of neural networks and can explore both rate and temporal coding for energy-efficient event-driven computation. However, the decision accuracy of existing SNN designs is contingent upon processing a large number of spikes over a long period. Nevertheless, the switching power of SNN hardware accelerators is proportional to the number of spikes processed while the length of spike trains limits throughput and static power efficiency. This paper presents the first study on developing temporal compression to significantly boost throughput and reduce energy dissipation of digital hardware SNN accelerators while being applicable to multiple spike codes. The proposed compression architectures consist of low-cost input spike compression units, novel input-and-output-weighted spiking neurons, and reconfigurable time constant scaling to support large and flexible time compression ratios. Our compression architectures can be transparently applied to any given pre-designed SNNs employing either rate or temporal codes while incurring minimal modification of the neural models, learning algorithms, and hardware design. Using spiking speech and image recognition datasets, we demonstrate the feasibility of supporting large time compression ratios of up to 16x, delivering up to 15.93x, 13.88x, and 86.21x improvements in throughput, energy dissipation, the tradeoffs between hardware area, runtime, energy, and classification accuracy, respectively based on different spike codes on a Xilinx Zynq-7000 FPGA. These results are achieved while incurring little extra hardware overhead.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to significantly increase throughput and reduce energy consumption while maintaining decision - making accuracy in hardware spiking neural network accelerators. Existing spiking neural network (SNNs) designs need to process a large number of spikes and make decisions over a long period of time, which leads to high energy consumption and low throughput of hardware accelerators. To solve this problem, the paper proposes a time - compression technique, aiming to improve the efficiency of hardware accelerators by compressing the time length of spike sequences without changing the number of spikes and their time characteristics. Specifically, the main contributions of the paper include: 1. **Universal Time - Compression Technique**: For the first time, a universal time - compression technique is proposed, which can transparently compress the duration of spike sequences on any given SNN and its spike encoding, thereby greatly reducing latency. 2. **Four Key Techniques**: - **Weighted - Representation Spike - Sequence Compression**: Use a weighted form to represent the compressed spike sequence in order to preserve the number and time characteristics of the original spikes. - **Input - Output Weighted (IOW) Spiking Neural Model**: Process the time - compressed spike sequence and generate weighted output spikes to preserve input information. - **Scaling of Time Constants**: Adjust the time constants of neurons, synapses, and learning dynamics to adapt to time compression. - **Flexible Compression - Ratio Support**: Use the time - averaging method to support flexible compression ratios, even if these ratios are not powers of 2. 3. **Low - Overhead Hardware Modifications**: Make low - overhead hardware modifications to existing SNN accelerators so that they can operate on a compressed time scale while preserving the number of spikes and time behavior in inference and training. 4. **Time - Compressed SNN Accelerator Architecture**: Propose a time - compressed SNN (TC - SNN) accelerator architecture and its programmable variant (PTC - SNN), which can achieve significant latency, energy - efficiency, and latency/energy/classification - accuracy trade - offs. By implementing multiple liquid - state - machine (LSM) spiking - neural accelerators on Xilinx Zynq - 7000 FPGA, the paper demonstrates the effectiveness of its proposed TC - SNN and PTC - SNN compression architectures. Experimental results show that these architectures can support a time - compression ratio of up to 16 times under different spike - encoding mechanisms, achieving significant improvements in throughput, energy consumption, hardware area, running time, energy, and classification - accuracy trade - offs respectively.

Boosting Throughput and Efficiency of Hardware Spiking Neural Accelerators using Time Compression Supporting Multiple Spike Codes

Spike Trains Encoding Optimization for Spiking Neural Networks Implementation in FPGA

Spike Trains Encoding and Threshold Rescaling Method for Deep Spiking Neural Networks

An Efficient Spiking Neural Network Accelerator with Sparse Weight.

You Only Spike Once: Improving Energy-Efficient Neuromorphic Inference to ANN-Level Accuracy

A Convolutional Spiking Neural Network Accelerator with the Sparsity-Aware Memory and Compressed Weights

An Event-driven Spiking Neural Network Accelerator with On-chip Sparse Weight

Hardware-Software Co-optimised Fast and Accurate Deep Reconfigurable Spiking Inference Accelerator Architecture Design Methodology

A Low Power and Low Latency FPGA-Based Spiking Neural Network Accelerator

A Fast Spiking Neural Network Accelerator based on BP-STDP Algorithm and Weighted Neuron Model

A Sparsity-Adapted Hardware Implementation of SNN for Cortical Spike Trains Decoding

A TTFS-based energy and utilization efficient neuromorphic CNN accelerator

PT-Spike: A Precise-Time-Dependent Single Spike Neuromorphic Architecture with Efficient Supervised Learning

A Cost-Efficient High-Speed VLSI Architecture for Spiking Convolutional Neural Network Inference Using Time-Step Binary Spike Maps

SPAT: FPGA-based Sparsity-Optimized Spiking Neural Network Training Accelerator with Temporal Parallel Dataflow

A Reconfigurable FPGA-based Spiking Neural Network Accelerator

Exploiting Temporal-Unrolled Parallelism for Energy-Efficient SNN Acceleration

A Time-to-first-spike Coding and Conversion Aware Training for Energy-Efficient Deep Spiking Neural Network Processor Design

MF-DSNN:An Energy-efficient High-performance Multiplication-free Deep Spiking Neural Network Accelerator

SATO: spiking neural network acceleration via temporal-oriented dataflow and architecture

A Spiking Neural Network Accelerator based on Ping-Pong Architecture with Sparse Spike and Weight