Abstract:Event-driven spiking neural networks (SNNs) have demonstrated significant potential for achieving high energy and area efficiency. However, existing SNN accelerators suffer from issues such as high latency and energy consumption due to serial accumulation-comparison operations. This is mainly because SNN neurons integrate spikes, accumulate membrane potential, and generate output spikes when the potential exceeds a threshold. To address this, one approach is to leverage the sparsity of SNN spikes to reduce the number of time steps. However, this method can result in imbalanced workloads among neurons and limit the utilization of processing elements (PEs). In this paper, we present SATO, a temporal-parallel SNN accelerator that enables parallel accumulation of membrane potential for all time steps. SATO adopts a two-stage pipeline methodology, effectively decoupling neuron computations. This not only maintains accuracy but also unveils opportunities for fine-grained parallelism. By dividing the neuron computation into distinct stages, SATO enables the concurrent execution of spike accumulation for each time step, leveraging the parallel processing capabilities of modern hardware architectures. This not only enhances the overall efficiency of the accelerator but also reduces latency by exploiting parallelism at a granular level. The architecture of SATO includes a novel binary adder-search tree for generating the output spike train, effectively decoupling the chronological dependence in the accumulation-comparison operation. Furthermore, SATO employs a bucket-sort-based method to evenly distribute compressed workloads to all PEs, maximizing data locality of input spike trains. Experimental results on various SNN models demonstrate that SATO outperforms the well-known accelerator, the 8-bit version of "Eyeriss" by 20.7× in terms of speedup and 6.0× energy-saving, on average. Compared to the state-of-the-art SNN accelerator "SpinalFlow", SATO can also achieve 4.6× performance gain and 3.1× energy reduction on average, which is quite impressive for inference.

A Systolic SNN Inference Accelerator and Its Co-optimized Software Framework

A Sparsity-Adapted Hardware Implementation of SNN for Cortical Spike Trains Decoding

A Low Power and Low Latency FPGA-Based Spiking Neural Network Accelerator

ONE-SA: Enabling Nonlinear Operations in Systolic Arrays for Efficient and Flexible Neural Network Inference

A Hybrid Heterogeneous Neural Network Accelerator Based on Systolic Array

Spike Trains Encoding Optimization for Spiking Neural Networks Implementation in FPGA

An FPGA Implementation of Deep Spiking Neural Networks for Low-Power and Fast Classification

A Cost-Efficient High-Speed VLSI Architecture for Spiking Convolutional Neural Network Inference Using Time-Step Binary Spike Maps

A Convolutional Spiking Neural Network Accelerator with the Sparsity-Aware Memory and Compressed Weights

Hardware-Software Co-optimised Fast and Accurate Deep Reconfigurable Spiking Inference Accelerator Architecture Design Methodology

A 1024-Neuron 1M-Synapse Event-Driven SNN Accelerator for DVS Applications

A 0.67-to-5.4 TSOPs/W Spiking Neural Network Accelerator with 128/256 Reconfigurable Neurons and Asynchronous Fully Connected Synapses

An Asynchronous Multi-core Accelerator for SNN inference

Hardware/Software co-design with ADC-Less In-memory Computing Hardware for Spiking Neural Networks

An End-to-End SoC for Brain-Inspired CNN-SNN Hybrid Applications

Neural Synaptic Plasticity-Inspired Computing: A High Computing Efficient Deep Convolutional Neural Network Accelerator

Exploiting Temporal-Unrolled Parallelism for Energy-Efficient SNN Acceleration

Exploring the Sparsity-Quantization Interplay on a Novel Hybrid SNN Event-Driven Architecture

A Reconfigurable FPGA-based Spiking Neural Network Accelerator

Systolic Array Based Accelerator and Algorithm Mapping for Deep Learning Algorithms.