A generic shift-norm-activation approach for deep learning

Zhi Chen,Pin-Han Ho
DOI: https://doi.org/10.1016/j.patcog.2020.107609
IF: 8
2021-01-01
Pattern Recognition
Abstract:<p>Deep learning has received increasing attention in the last decade. Its amazing success, is partly attributed to the evolution of normalization and activation techniques. However, less works have devoted to explore both modules together. This work, therefore, aims at pushing for a deeper understanding on the effect of normalization and activation together analytically. We design a generic method which integrates both normalization and activation together as a whole, named as the Generic Shift-Normalization-Activation Approach (GSNA), in reserving richer information propagation in neural networks. A rigorous mathematical analysis was performed to investigate the benefits of the designed method, such as its computation complexity, performance potential as well as optimization over trainable parameter initialization. Further, extensive experiments are conducted to demonstrate the superiority and generality of the designed method in many computer vision benchmarking tasks, such as CIFAR-10/100, SVHN, ImageNet32 × 32, etc. To explore its generality, we also conduct some experiments on natural language understanding tasks like text classification, natural language inference, and some variational generative task as well. More interestingly, GSNA can be naturally incorporated into the existing neural networks with arbitrary architectures, demonstrating its generic effectiveness in deep learning field.</p>
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?