Abstract:Modern deep convolutional neural networks(CNNs) are often designed to be scalable, leading to the model family concept. A model family is a large (possibly infinite) collection of related neural network architectures. The isomorphism of a model family refers to the fact that the models within it share the same high-level structure. Meanwhile, the models within the model family are called isomorphic models for each other. Existing weight initialization methods for CNNs use random initialization or data-driven initialization. Even though these methods can perform satisfactory initialization, the isomorphism of model families is rarely explored. This work proposes an isomorphic model-based initialization method (IM Init) for CNNs. It can initialize any network with another well-trained isomorphic model in the same model family. We first formulate the widely used general network structure of CNNs. Then a structural weight transformation is presented to transform the weight between two isomorphic models. Finally, we apply our IM Init to the model down-sampling and up-sampling scenarios and confirm its effectiveness in improving accuracy and convergence speed through experiments on various image classification datasets. In the model down-sampling scenario, IM Init initializes the smaller target model with a larger well-trained source model. It improves the accuracy of RegNet200MF by 1.59% on the CIFAR-100 dataset and 1.9% on the CUB200 dataset. Inversely, IM Init initializes the larger target model with a smaller well-trained source model in the model up-sampling scenario. It significantly speeds up the convergence of RegNet600MF and improves the accuracy by 30.10% under short training schedules. Code will be available.

Identical Initialization: A Universal Approach to Fast and Stable Training of Neural Networks

How to Initialize your Network? Robust Initialization for WeightNorm & ResNets

Neuron Campaign for Initialization Guided by Information Bottleneck Theory

Advancing Neural Network Performance through Emergence-Promoting Initialization Scheme

Isomorphic Model-Based Initialization for Convolutional Neural Networks

A Unified Weight Initialization Paradigm for Tensorial Convolutional Neural Networks.

Rethinking the Usage of Batch Normalization and Dropout in the Training of Deep Neural Networks

IKUN: Initialization to Keep snn training and generalization great with sUrrogate-stable variaNce

Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks

A Sober Look at Neural Network Initializations

Improving Classification Performance in Dendritic Neuron Models through Practical Initialization Strategies

A mathematical framework for improved weight initialization of neural networks using Lagrange multipliers

Critical Initialization of Wide and Deep Neural Networks through Partial Jacobians: General Theory and Applications

One Hyper-Initializer for All Network Architectures in Medical Image Analysis

A Type of Generalization Error Induced by Initialization in Deep Neural Networks.

On the Crucial Role of Initialization for Matrix Factorization

On Symmetry and Initialization for Neural Networks

Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks

Robust Training and Initialization of Deep Neural Networks: An Adaptive Basis Viewpoint

Stability-Informed Initialization of Neural Ordinary Differential Equations

Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice