Marmotini: A Weight Density Adaptation Architecture with Hybrid Compression Method for Spiking Neural Network

Zilin Wang,Yi Zhong,Zehong Ou,Youming Yang,Shuo Feng,Guang Chen,Xiaoxin Cui,Song Jia,Yuan Wang
DOI: https://doi.org/10.1109/tvlsi.2024.3453897
2024-01-01
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Abstract:Brain-inspired spiking neural network (SNN) has recently attracted widespread interest owing to its event-driven nature and relatively low-power hardware for transmitting highly sparse binary spikes. To further improve energy efficiency, some matrix compression algorithms are used for weight storage. However, the weight sparsity of different layers varies greatly. For a multicore neuromorphic system, it is difficult for the same compression algorithm to adapt to all the layers of SNN model. In this work, we propose a weight density adaptation architecture with hybrid compression method for SNN, named Marmotini. It is a multicore heterogeneous design, including three types of cores to complete computation of different weight sparsity. Benefiting from the hybrid compression method, Marmotini minimizes the waste of neurons and weights as much as possible. Besides, for better flexibility, a reconfigurable core that can be configured to compute convolutional layer or fully connected layer is proposed. Implemented on Xilinx Kintex UltraScale XCKU115 field-programmable gate array (FPGA) board, Marmotini can operate at 150-MHz frequency, achieving 244.6-GSOP/s peak performance and 54.1-GSOP/W energy efficiency at 0% spike sparsity.
What problem does this paper attempt to address?