Abstract:Spiking neural networks (SNNs) are considered as energy-efficient alternatives to deep neural networks (DNNs). By adopting event-driven information processing, SNNs can significantly reduce the computational demands associated with DNNs, while still achieving comparable performance. However, current SNNs primarily prioritize high accuracy and large sparsity by constructing complex neuron models that generate sparse spikes. Unfortunately, this approach results in low energy efficiency and high latency, posing a significant challenge for deploying SNNs at the edge. Furthermore, the dominant computation in SNNs, which involves spike-wise Add-Accumulate operations, is well-suited for process-in-memory (PIM) architectures. However, exploiting high parallel processing and spike sparsity in PIM-based SNN accelerators is challenging due to the irregularity and time dependency of spikes. To address these issues, this paper proposes LowPASS, an algorithm hardware co-design framework. LowPASS exploits inherently rich sparsity in SNN spike activity using an in-situ compute-enabled crossbar architectures for high energy efficiency and low latency without compromising accuracy. Firstly, we selectively merge multiple time steps into single-shot computations based on output sparsity, enabling speculative fast forwarding. This approach improves crossbar utilization by reducing idling during sparse spike processing. Secondly, we introduce a dedicated PIM-based architecture for SNNs, leveraging the large sparsity of SNNs for highly parallel and energy-efficient computation. LowPASS incorporates popcount-based circuits to efficiently handle merged time steps on the crossbar, maximizing spike sparsity utilization and crossbar parallelism. Evaluations show that LowPASS outperforms the previous ANN accelerator Eyeriss (8-bit version) by 6,860X in terms of energy saving. In comparison to the state-of-the-art SNN accelerator T2FSNN, LowPASS achieves a significant 2,147X energy reduction. Even compared with the sparse-aware PIM architecture, LowPASS still shows 5.09X energy efficiency gain. These results showcase the impressive performance and energy efficiency of LowPASS, making it a compelling choice for SNN inference tasks.

LowPASS: A Low Power PIM-based Accelerator with Speculative Scheme for SNNs

COMPASS: SRAM-Based Computing-in-Memory SNN Accelerator with Adaptive Spike Speculation

Exploiting Temporal-Unrolled Parallelism for Energy-Efficient SNN Acceleration

An Event-driven Spiking Neural Network Accelerator with On-chip Sparse Weight

SPAT: FPGA-based Sparsity-Optimized Spiking Neural Network Training Accelerator with Temporal Parallel Dataflow

A Low Power and Low Latency FPGA-Based Spiking Neural Network Accelerator

An Efficient Spiking Neural Network Accelerator with Sparse Weight.

SATO: spiking neural network acceleration via temporal-oriented dataflow and architecture

An Energy-Efficient Spiking Neural Network Accelerator Based on Spatio-Temporal Redundancy Reduction

SparseNN: an Energy-Efficient Neural Network Accelerator Exploiting Input and Output Sparsity

PULSE: Parametric Hardware Units for Low-power Sparsity-Aware Convolution Engine

ESSA: Design of a Programmable Efficient Sparse Spiking Neural Network Accelerator

A Spiking Neural Network Accelerator based on Ping-Pong Architecture with Sparse Spike and Weight

MF-DSNN:An Energy-efficient High-performance Multiplication-free Deep Spiking Neural Network Accelerator

SDP: Co-Designing Algorithm, Dataflow, and Architecture for In-SRAM Sparse NN Acceleration

A Convolutional Spiking Neural Network Accelerator with the Sparsity-Aware Memory and Compressed Weights

A 28nm Configurable Asynchronous SNN Accelerator with Energy-Efficient Learning

Randomize and Match: Exploiting Irregular Sparsity for Energy Efficient Processing in SNNs

A Reconfigurable FPGA-based Spiking Neural Network Accelerator

You Only Spike Once: Improving Energy-Efficient Neuromorphic Inference to ANN-Level Accuracy

30.2 A 22nm 0.26nW/Synapse Spike-Driven Spiking Neural Network Processing Unit Using Time-Step-First Dataflow and Sparsity-Adaptive In-Memory Computing