Abstract:We propose an algorithm capable of identifying and eliminating irrelevant layers of a neural network during the early stages of training. In contrast to weight or filter-level pruning, layer pruning reduces the harder to parallelize sequential computation of a neural network. We employ a structure using residual connections around nonlinear network sections that allow the flow of information through the network once a nonlinear section is pruned. Our approach is based on variational inference principles using Gaussian scale mixture priors on the neural network weights and allows for substantial cost savings during both training and inference. More specifically, the variational posterior distribution of scalar Bernoulli random variables multiplying a layer weight matrix of its nonlinear sections is learned, similarly to adaptive layer-wise dropout. To overcome challenges of concurrent learning and pruning such as premature pruning and lack of robustness with respect to weight initialization or the size of the starting network, we adopt the "flattening" hyper-prior on the prior parameters. We prove that, as a result of its usage, the solutions of the resulting optimization problem describe deterministic networks with parameters of the posterior distribution at either 0 or 1. We formulate a projected SGD algorithm and prove its convergence to such a solution using stochastic approximation results. In particular, we prove conditions that lead to a layer's weights converging to zero and derive practical pruning conditions from the theoretical results. The proposed algorithm is evaluated on the MNIST, CIFAR-10 and ImageNet datasets and common LeNet, VGG16 and ResNet architectures. The simulations demonstrate that our method achieves state-of the-art performance for layer pruning at reduced computational cost in distinction to competing methods due to the concurrent training and pruning.

Sensitivity-Based Layer Insertion for Residual and Feedforward Neural Networks

SAfER: Layer-Level Sensitivity Assessment for Efficient and Robust Neural Network Inference

Layer-Specific Optimization: Sensitivity Based Convolution Layers Basis Search

Concurrent Training and Layer Pruning of Deep Neural Networks

Should You Go Deeper? Optimizing Convolutional Neural Network Architectures without Training by Receptive Field Analysis

Growing Tiny Networks: Spotting Expressivity Bottlenecks and Fixing Them Optimally

Density-embedding layers: a general framework for adaptive receptive fields

Field theory for optimal signal propagation in ResNets

Layer-wise synapse optimization for implementing neural networks on general neuromorphic architectures

An Adaptive and Stability-Promoting Layerwise Training Approach for Sparse Deep Neural Network Architecture

Improving the Trainability of Deep Neural Networks through Layerwise Batch-Entropy Regularization

LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging

Training Deep Capsule Networks with Residual Connections

Optimal Algorithm for a Multilayer Feedforward Net and Its Application

NeuralScale: Efficient Scaling of Neurons for Resource-Constrained Deep Neural Networks

LOss-Based SensiTivity rEgulaRization: towards deep sparse neural networks

Rethinking Residual Connection with Layer Normalization

LayerOut: Freezing Layers in Deep Neural Networks

Complexity-Aware Training of Deep Neural Networks for Optimal Structure Discovery

NeuralSens: Sensitivity Analysis of Neural Networks

Adaptive Activation Functions for Predictive Modeling with Sparse Experimental Data