Isomorphic Model-Based Initialization for Convolutional Neural Networks

Hong Zhang,Yang Li,Hanqing Yang,Bin He,Yu Zhang
DOI: https://doi.org/10.1016/j.jvcir.2022.103677
IF: 2.887
2022-01-01
Journal of Visual Communication and Image Representation
Abstract:Modern deep convolutional neural networks(CNNs) are often designed to be scalable, leading to the model family concept. A model family is a large (possibly infinite) collection of related neural network architectures. The isomorphism of a model family refers to the fact that the models within it share the same high-level structure. Meanwhile, the models within the model family are called isomorphic models for each other. Existing weight initialization methods for CNNs use random initialization or data-driven initialization. Even though these methods can perform satisfactory initialization, the isomorphism of model families is rarely explored. This work proposes an isomorphic model-based initialization method (IM Init) for CNNs. It can initialize any network with another well-trained isomorphic model in the same model family. We first formulate the widely used general network structure of CNNs. Then a structural weight transformation is presented to transform the weight between two isomorphic models. Finally, we apply our IM Init to the model down-sampling and up-sampling scenarios and confirm its effectiveness in improving accuracy and convergence speed through experiments on various image classification datasets. In the model down-sampling scenario, IM Init initializes the smaller target model with a larger well-trained source model. It improves the accuracy of RegNet200MF by 1.59% on the CIFAR-100 dataset and 1.9% on the CUB200 dataset. Inversely, IM Init initializes the larger target model with a smaller well-trained source model in the model up-sampling scenario. It significantly speeds up the convergence of RegNet600MF and improves the accuracy by 30.10% under short training schedules. Code will be available.
What problem does this paper attempt to address?