Abstract:As well known, the huge memory and compute costs of both artificial neural networks (ANNs) and spiking neural networks (SNNs) greatly hinder their deployment on edge devices with high efficiency. Model compression has been proposed as a promising technique to improve the running efficiency via parameter and operation reduction, whereas this technique is mainly practiced in ANNs rather than SNNs. It is interesting to answer how much an SNN model can be compressed without compromising its functionality, where two challenges should be addressed: 1) the accuracy of SNNs is usually sensitive to model compression, which requires an accurate compression methodology and 2) the computation of SNNs is event-driven rather than static, which produces an extra compression dimension on dynamic spikes. To this end, we realize a comprehensive SNN compression through three steps. First, we formulate the connection pruning and weight quantization as a constrained optimization problem. Second, we combine spatiotemporal backpropagation (STBP) and alternating direction method of multipliers (ADMMs) to solve the problem with minimum accuracy loss. Third, we further propose activity regularization to reduce the spike events for fewer active operations. These methods can be applied in either a single way for moderate compression or a joint way for aggressive compression. We define several quantitative metrics to evaluate the compression performance for SNNs. Our methodology is validated in pattern recognition tasks over MNIST, N-MNIST, CIFAR10, and CIFAR100 datasets, where extensive comparisons, analyses, and insights are provided. To the best of our knowledge, this is the first work that studies SNN compression in a comprehensive manner by exploiting all compressible components and achieves better results.

A Compressed Data Partition and Loop Scheduling Scheme for Neural Networks

Efficient Structure Slimming for Spiking Neural Networks

Efficient Partitioning and Communication Scheme-Based Distributed Edge Computing to Accelerate Deep Neural Network

Efficient Neural Network Compression Inspired by Compressive Sensing.

CMD: Controllable Matrix Decomposition with Global Optimization for Deep Neural Network Compression

Neural Network Compression Via Sparse Optimization

A Computing Efficient Hardware Architecture for Sparse Deep Neural Network Computing

Compressing Deep Networks by Neuron Agglomerative Clustering

Deep Architecture Compression with Automatic Clustering of Similar Neurons

Comprehensive SNN Compression Using ADMM Optimization and Activity Regularization

An efficient pruning and fine-tuning method for deep spiking neural network

Highly Efficient Sparse Neural Network Computing - Hardware and Software Solutions.

Resource Constrained Model Compression via Minimax Optimization for Spiking Neural Networks

ReRAM-Sharing: Fine-Grained Weight Sharing for ReRAM-Based Deep Neural Network Accelerator.

Optimizing Off-Chip Memory Access for Deep Neural Network Accelerator

Exploring Resource-Aware Deep Neural Network Accelerator and Architecture Design

Neural Network Compression Based on Tensor Ring Decomposition

Learning the sparsity for ReRAM - mapping and pruning sparse neural network for ReRAM based accelerator.

Crossbar-Aligned & Integer-Only Neural Network Compression for Efficient In-Memory Acceleration

A Heuristic and Greedy Weight Remapping Scheme with Hardware Optimization for Irregular Sparse Neural Networks Implemented on CIM Accelerator in Edge AI Applications

Learning the Sparsity for ReRAM