Abstract:The study of sparsity in Convolutional Neural Networks (CNNs) has become widespread to compress and accelerate models in environments with limited resources. By constraining N consecutive weights along the output channel to be group-wise non-zero, the recent network with 1$\times$N sparsity has received tremendous popularity for its three outstanding advantages: 1) A large amount of storage space saving by a \emph{Block Sparse Row} matrix. 2) Excellent performance at a high sparsity. 3) Significant speedups on CPUs with Advanced Vector Extensions. Recent work requires selecting and fine-tuning 1$\times$N sparse weights based on dense pre-trained weights, leading to the problems such as expensive training cost and memory access, sub-optimal model quality, as well as unbalanced workload across threads (different sparsity across output channels). To overcome them, this paper proposes a novel \emph{\textbf{S}oft \textbf{U}niform \textbf{B}lock \textbf{P}runing} (SUBP) approach to train a uniform 1$\times$N sparse structured network from scratch. Specifically, our approach tends to repeatedly allow pruned blocks to regrow to the network based on block angular redundancy and importance sampling in a uniform manner throughout the training process. It not only makes the model less dependent on pre-training, reduces the model redundancy and the risk of pruning the important blocks permanently but also achieves balanced workload. Empirically, on ImageNet, comprehensive experiments across various CNN architectures show that our SUBP consistently outperforms existing 1$\times$N and structured sparsity methods based on pre-trained models or training from scratch. Source codes and models are available at \url{https://github.com/JingyangXiang/SUBP}.

Learning soft threshold for sparse reparameterization using gradual projection operators

Deep Neural Network Acceleration with Sparse Prediction Layers

SUBP: Soft Uniform Block Pruning for 1 X N Sparse CNNs Multithreading Acceleration

Batch-Normalization-based Soft Filter Pruning for Deep Convolutional Neural Networks

SUBP: Soft Uniform Block Pruning for 1xn Sparse CNNs Multithreading Acceleration

Joint Sparsity with Mixed Granularity for Efficient GPU Implementation

Structured Probabilistic Pruning for Convolutional Neural Network Acceleration.

Efficient Network Compression Through Smooth-Lasso Constraint

Neural Network Pruning by Gradient Descent

Sparse Training via Boosting Pruning Plasticity with Neuroregeneration

FGGP: Fixed-Rate Gradient-First Gradual Pruning

Spatial-Winograd Pruning Enabling Sparse Winograd Convolution

Enabling Retrain-free Deep Neural Network Pruning using Surrogate Lagrangian Relaxation

PRUNING IN TRAINING: LEARNING AND RANKING SPARSE CONNECTIONS IN DEEP CONVOLUTIONAL NETWORKS

Global balanced iterative pruning for efficient convolutional neural networks

Efficient Neural Networks with Spatial Wise Sparsity Using Unified Importance Map.

Pushing the Limits of Sparsity: A Bag of Tricks for Extreme Pruning

CGaP: Continuous Growth and Pruning for Efficient Deep Learning

Sparse optimization guided pruning for neural networks

Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution

DTS: Dynamic Training Slimming with Feature Sparsity for Efficient Convolutional Neural Network