Abstract:The study of sparsity in Convolutional Neural Networks (CNNs) has become widespread to compress and accelerate models in environments with limited resources. By constraining N consecutive weights along the output channel to be group-wise non-zero, the recent network with 1xN sparsity has received tremendous popularity for its three outstanding advantages: 1) A large amount of storage space saving by a Block Sparse Row matrix. 2) Excellent performance at a high sparsity. 3) Significant speedups on CPUs with Advanced Vector Extensions. Recent work requires selecting and fine-tuning 1xN sparse weights based on dense pre-trained weights, leading to the problems such as expensive training cost and memory access, sub-optimal model quality, as well as unbalanced workload across threads (different sparsity across output channels). To overcome them, this paper proposes a novel Soft Uniform Block Pruning (SUBP) approach to train a uniform 1xN sparse structured network from scratch. Specifically, our approach tends to repeatedly allow pruned blocks to regrow to the network based on block angular redundancy and importance sampling in a uniform manner throughout the training process. It not only makes the model less dependent on pre-training, reduces the model redundancy and the risk of pruning the important blocks permanently but also achieves balanced workload. Empirically, on ImageNet, comprehensive experiments across various CNN architectures show that our SUBP consistently outperforms existing 1xN and structured sparsity methods based on pre-trained models or training from scratch. Source codes and models are available at https://github.com/JingyangXiang/SUBP.

When Sparsity Meets Dynamic Convolution

Deep Neural Network Acceleration with Sparse Prediction Layers

Joint Sparsity with Mixed Granularity for Efficient GPU Implementation

SUBP: Soft Uniform Block Pruning for 1 X N Sparse CNNs Multithreading Acceleration

MaxQ: Multi-Axis Query for N:M Sparsity Network

DTS: Dynamic Training Slimming with Feature Sparsity for Efficient Convolutional Neural Network

Efficient Network Compression Through Smooth-Lasso Constraint

Multi-Dimensional Dynamic Pruning: Exploring Spatial and Channel Fuzzy Sparsity

Inducing Semi-Structured Sparsity by Masking for Efficient Model Inference in Convolutional Networks

Dynamic CNN Accelerator Supporting Efficient Filter Generator with Kernel Enhancement and Online Channel Pruning

Frequency-Domain Dynamic Pruning for Convolutional Neural Networks

Students and teachers learning together: a robust training strategy for neural network pruning

A Pruning Method Based on the Dissimilarity of Angle among Channels and Filters

PCONV: the Missing but Desirable Sparsity in DNN Weight Pruning for Real-Time Execution on Mobile Devices.

KernelWarehouse: Rethinking the Design of Dynamic Convolution

Exploring Fine-Grained Sparsity in Convolutional Neural Networks for Efficient Inference

Dynamic Structure Pruning for Compressing CNNs

Exploring the Regularity of Sparse Structure in Convolutional Neural Networks

Learning Efficient Convolutional Networks Through Network Slimming.

Pre-defined Sparsity for Low-Complexity Convolutional Neural Networks

Enabling Sparse Winograd Convolution by Native Pruning