Structure injected weight normalization for training deep networks

Xu Yuan,Xiangjun Shen,Sumet Mehta,Teng Li,Shiming Ge,Zhengjun Zha
DOI: https://doi.org/10.1007/s00530-021-00793-7
IF: 3.9
2021-04-27
Multimedia Systems
Abstract:Weight normalization (WN) can help to stabilize the distribution of activations over layers, which boost the performance of DNNs in generalization. In this paper, we further propose deep structural weight normalization (DSWN) methods to inject the network structure measurements into the WN to fully acknowledge the data propagation through the neural network. In DSWN, two novel structural measurements are developed to impose regularity on each network weight using different penalty matrices. One is sparsity measurement (DSWN-SM). In this measurement, L1,2 weight regularization is applied in our proposed model to promote competition for features between network weights to obtain a sparsity network and finally prune the network. The other is neuron measurement (DSWN-NM). It uses L2 norm of column weight to scale up or down the importance of each intermediate neuron, which leads to accelerating the speed of network convergence. Extensive experiments on several benchmark image datasets using fully connected network and convolution neural network are performed, and the proposed DSWN-SM and DSWN-NM methods are compared with state-of-the-art sparsity and weight normalization methods. The results show that DSWN-SM can reduce the number of trainable parameters while guaranteeing high accuracy, whereas DSWN-NM can accelerate the convergence while improving the performance of deep networks.
computer science, information systems, theory & methods
What problem does this paper attempt to address?