ESSA: Design of a Programmable Efficient Sparse Spiking Neural Network Accelerator

Yisong Kuang,Xiaoxin Cui,Zilin Wang,Chenglong Zou,Yi Zhong,Kefei Liu,Zhenhui Dai,Dunshan Yu,Yuan Wang,Ru Huang
DOI: https://doi.org/10.1109/tvlsi.2022.3183126
2022-01-01
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Abstract:Spiking neural networks (SNNs) have been witnessing the developing trends to reduce the model size and improve the hardware efficiency for area- and energy-based applications, which are processed by model pruning and data compressions. However, it is challenging to exploit the unstructured sparsity of SNNs for the dense neuromorphic processors. In this article, we present an efficient sparse SNN accelerator (ESSA), which leverages both the temporal sparsity of spike events and the spatial sparsity of weights in SNN inference. It provides both the compressed weights for sparse SNNs and the uncompressed weights for compact SNNs. The self-adaptive spike compression is proposed for sparse spike scenarios, leading to the improvement of throughput by $3.2\times $ . ESSA executes a flexible fan-in–fan-out tradeoff by using combinable dendrites, which overcomes the fan-in limitation in neuromorphic systems. Furthermore, a low-latency intrachip spike multicast method is adopted to reduce the resource overhead. Implemented on the Xilinx Kintex Ultrascale field-programmable gate array (FPGA), ESSA achieves an equivalent performance of 253.1 GSOP/s and an energy efficiency of 32.1 GSOP/W for 75% weight sparsity at 140 MHz. The implementation of a four-layer fully connected SNN is expected to perform $2.6~\mu \text{s}$ per time step and the energy consumption is $14.6~\mu \text{J}$ . Our results demonstrate that ESSA outperforms several state-of-the-art application-specific integrated circuit (ASIC) or FPGA neuromorphic processors.
What problem does this paper attempt to address?