Abstract:Spiking neural networks (SNNs) are promising alternatives to artificial neural networks (ANNs) since they are more realistic brain-inspired computing models. SNNs have sparse neuron firing over time, i.e., spatiotemporal sparsity; thus, they are helpful in enabling energy-efficient hardware inference. However, exploiting the spatiotemporal sparsity of SNNs in hardware leads to unpredictable and unbalanced workloads, degrading the energy efficiency. Compared to SNNs with simple fully connected structures, those extensive structures (e.g., standard convolutions, depthwise convolutions, and pointwise convolutions) can deal with more complicated tasks but lead to difficulties in hardware mapping. In this work, we propose a novel reconfigurable architecture, Cerebron, which can fully exploit the spatiotemporal sparsity in SNNs with maximized data reuse and propose optimization techniques to improve the efficiency and flexibility of the hardware. To achieve flexibility, the reconfigurable compute engine is compatible with a variety of spiking layers and supports inter-computing-unit (CU) and intra-CU reconfiguration. The compute engine can exploit data reuse and guarantee parallel data access when processing different convolutions to achieve memory efficiency. A two-step data sparsity exploitation method is introduced to leverage the sparsity of discrete spikes and reduce the computation time. Besides, an online channelwise workload scheduling strategy is designed to reduce the latency further. Cerebron is verified on image segmentation and classification tasks using a variety of state-of-the-art spiking network structures. Experimental results show that Cerebron has achieved at least $17.5\times $ prediction energy reduction and $20\times $ speedup compared with state-of-the-art field-programmable gate array (FPGA)-based accelerators.

An Energy-Efficient Architecture for Accelerating Inference of Memory-Augmented Neural Networks

A Low-Power In-Memory Multiplication and Accumulation Array with Modified Radix-4 Input and Canonical Signed Digit Weights

Flash-Based Content Addressable Memory with L2 Distance for Memory-Augmented Neural Network

A Fast and Power Efficient Architecture to Parallelize LSTM based RNN for Cognitive Intelligence Applications.

A 3.89-Gops/mw Scalable Recurrent Neural Network Processor with Improved Efficiency on Memory and Computation

ARMIN: Towards a More Efficient and Light-weight Recurrent Memory Network.

NeuroNAS: A Framework for Energy-Efficient Neuromorphic Compute-in-Memory Systems using Hardware-Aware Spiking Neural Architecture Search

Quantized Memory-Augmented Neural Networks

HiMA: A Fast and Scalable History-based Memory Access Engine for Differentiable Neural Computer

Distributed Associative Memory Network with Memory Refreshing Loss

Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures

Efficient Memory Management for Deep Neural Net Inference

Survey on Memory-Augmented Neural Networks: Cognitive Insights to AI Applications

Reconfigurable Architecture for Neural Approximation in Multimedia Computing.

An Energy-Efficient Near-Data Processing Accelerator for DNNs that Optimizes Data Accesses

Memory Access Optimization of a Neural Network Accelerator Based on Memory Controller

Flash Memory Array for Efficient Implementation of Deep Neural Networks

Shift-BNN: Highly-Efficient Probabilistic Bayesian Neural Network Training via Memory-Friendly Pattern Retrieving

Cerebron: A Reconfigurable Architecture for Spatiotemporal Sparse Spiking Neural Networks

EPHA: An Energy-efficient Parallel Hybrid Architecture for ANNs and SNNs

ARAS: An Adaptive Low-Cost ReRAM-Based Accelerator for DNNs