On Centralization and Unitization of Batch Normalization for Deep ReLU Neural Networks

Wen Fei,Wenrui Dai,Chenglin Li,Junni Zou,Hongkai Xiong
DOI: https://doi.org/10.1109/tsp.2024.3410291
IF: 4.875
2024-01-01
IEEE Transactions on Signal Processing
Abstract:Batch normalization (BN) enhances the training of deep ReLU neural network with a composition of mean centering (centralization) and variance scaling (unitization). Despite the success of BN, there lacks a theoretical explanation to elaborate the effects of BN on training dynamics and guide the design of normalization methods. In this paper, we elucidate the effects of centralization and unitization on training deep ReLU neural networks for BN. We first reveal that feature centralization in BN stabilizes the correlation coefficients of features in unnormalized ReLU neural networks to achieve feature decorrelation and accelerate convergence in training. We demonstrate that weight centralization that subtracts means from weight parameters is equivalent to BN in feature decorrelation and achieves the same linear convergence rate in training. Subsequently, we show that feature unitization in BN enables dynamic learning rate that inversely varies with the norm of features for training and propose an adaptive loss function to emulate feature unitization. Furthermore, we exemplify the theoretical results to develop an efficient alternative to BN using a simple combination of weight centralization and the proposed adaptive loss function. Extensive experiments show that the proposed method achieves comparable classification accuracy and evidently reduces memory consumption in comparison to BN, and outperforms normalization-free methods in image classification. We further extend the weight centralization to enable small-batch training for object detection networks.
What problem does this paper attempt to address?