Abstract:The study of sparsity in Convolutional Neural Networks (CNNs) has become widespread to compress and accelerate models in environments with limited resources. By constraining N consecutive weights along the output channel to be group-wise non-zero, the recent network with 1$\times$N sparsity has received tremendous popularity for its three outstanding advantages: 1) A large amount of storage space saving by a \emph{Block Sparse Row} matrix. 2) Excellent performance at a high sparsity. 3) Significant speedups on CPUs with Advanced Vector Extensions. Recent work requires selecting and fine-tuning 1$\times$N sparse weights based on dense pre-trained weights, leading to the problems such as expensive training cost and memory access, sub-optimal model quality, as well as unbalanced workload across threads (different sparsity across output channels). To overcome them, this paper proposes a novel \emph{\textbf{S}oft \textbf{U}niform \textbf{B}lock \textbf{P}runing} (SUBP) approach to train a uniform 1$\times$N sparse structured network from scratch. Specifically, our approach tends to repeatedly allow pruned blocks to regrow to the network based on block angular redundancy and importance sampling in a uniform manner throughout the training process. It not only makes the model less dependent on pre-training, reduces the model redundancy and the risk of pruning the important blocks permanently but also achieves balanced workload. Empirically, on ImageNet, comprehensive experiments across various CNN architectures show that our SUBP consistently outperforms existing 1$\times$N and structured sparsity methods based on pre-trained models or training from scratch. Source codes and models are available at \url{https://github.com/JingyangXiang/SUBP}.

Learning Low-Rank Structured Sparsity in Recurrent Neural Networks

Learning Sparse Patterns in Deep Neural Networks

SUBP: Soft Uniform Block Pruning for 1xn Sparse CNNs Multithreading Acceleration

SUBP: Soft Uniform Block Pruning for 1 X N Sparse CNNs Multithreading Acceleration

Efficient Structure Slimming for Spiking Neural Networks

Learning Structured Sparsity in Deep Neural Networks

Activity Sparsity Complements Weight Sparsity for Efficient RNN Inference

Learning k-Level Structured Sparse Neural Networks Using Group Envelope Regularization

Structured Pruning of Recurrent Neural Networks through Neuron Selection

Universal structural patterns in sparse recurrent neural networks

Block-Sparse Recurrent Neural Networks

DRRNets: Dynamic Recurrent Routing Via Low-Rank Regularization in Recurrent Neural Networks.

Geometric sparsification in recurrent neural networks

LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation

Exploring the Regularity of Sparse Structure in Convolutional Neural Networks

Adaptive Structured Sparse Network for Efficient CNNs with Feature Regularization.

Sparse Spectral Training and Inference on Euclidean and Hyperbolic Neural Networks

Backpropagation with Sparsity Regularization for Spiking Neural Network Learning.

Adaptive Structured Sparse Network for Efficient CNNs with Feature Regularization

Learning Instance-wise Sparsity for Accelerating Deep Models

Learn To be Efficient: Build Structured Sparsity in Large Language Models