Abstract:Deep convolutional neural network (DCNN) has led to significant breakthroughs in deep learning.However, larger models and larger datasets result in longer training times slowing down the development progress of deep learning.In this paper, following the idea of domain decomposition methods, we propose and study a new method to parallelize the training of DCNNs by decomposing and composing DCNNs.First, a global network is decomposed into several sub-networks by partitioning the width of the network (i.e., along the channel dimension) while keeping the depth constant.All the sub-networks are individually trained, in parallel without any interprocessor communication, with the corresponding decomposed samples from the input data.Then, following the idea of nonlinear preconditioning, we propose a sub-network transfer learning strategy in which the weights of the trained sub-networks are recomposed to initialize the global network, which is then trained to further adapt the parameters.Some theoretical analyses are provided to show the effectiveness of the sub-network transfer learning strategy.More precisely speaking, we prove that (1) the initialized global network can extract the feature maps learned by the sub-networks; (2) the initialization of the global network can provide an upper bound and a lower bound for the cost function and the classification accuracy with the corresponding values of the trained sub-networks.Some experiments are provided to evaluate the proposed methods.The results show that the sub-network transfer learning strategy can indeed provide good initialization and accelerate the training of the global network.Additionally, after further training, the transfer learning strategy shows almost no loss of accuracy and sometimes the accuracy is higher than if the network is initialized randomly.

Online Block Layer Decomposition schemes for training Deep Neural Networks

Block Layer Decomposition schemes for training Deep Neural Networks

Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks

A Proximal Block Coordinate Descent Algorithm for Deep Neural Network Training

On the Flexibility of Block Coordinate Descent for Large-Scale Optimization.

Block-cyclic stochastic coordinate descent for deep neural networks

Unlocking Deep Learning: A BP-Free Approach for Parallel Block-Wise Training of Neural Networks

Rethinking the Usage of Batch Normalization and Dropout in the Training of Deep Neural Networks

Practical Quasi-Newton Methods for Training Deep Neural Networks

Block Coordinate Descent Methods for Structured Nonconvex Optimization with Nonseparable Constraints: Optimality Conditions and Global Convergence

Effects of Depth, Width, and Initialization: A Convergence Analysis of Layer-wise Training for Deep Linear Neural Networks

0/1 Deep Neural Networks via Block Coordinate Descent

Distributed Newton Methods for Deep Neural Networks

Online Deep Learning: Learning Deep Neural Networks on the Fly

Training DNNs with Hybrid Block Floating Point

Decomposition and Composition of Deep Convolutional Neural Networks and Training Acceleration Via Sub-Network Transfer Learning

Deeply Supervised Block-Wise Neural Architecture Search

Block-wise Training of Residual Networks via the Minimizing Movement Scheme

Massive Dimensions Reduction and Hybridization with Meta-heuristics in Deep Learning

Evolutionary Shallowing Deep Neural Networks at Block Levels

Online Learning for DNN Training: A Stochastic Block Adaptive Gradient Algorithm