Abstract:Regularization is a very effective algorithm to solve overfitting problem in neural network, which improves the generalization ability of the model. However, their working mechanisms and the impact on the model performance have not been fully explored. In this paper, we study and analyze them using information bottleneck theory and one theory from human brain sensory system. We propose a metric to characterise the encoding length of hidden layers, named as AEntry value. Then, we implement extensive experiments on MNIST and FashionMNIST datasets with several commonly used regularization algorithms, and calculate the corresponding AEntry values. We analyze these results and obtain three conclusions. (1) The introduction of regularization influences the encoding of relative features with prediction task in neural network. The early stopping technique avoids introducing unrelated information with the task into the model by stopping the training process as an appropriate iterations. Laplace, Gaussian and Sparse Response regularizations compress the related representation and improve the performance of neural network by introducing the prior information into the model. In contrast, Dropout, Batch Normalization, and Layer Normalization increase the encoding length of features by adopting redundant representation to improve the performance. (2) The encoding of neural network does not satisfy the data processing inequality of information theory, which is mainly caused by redundant coding of extracted features. (3) The overfitting is caused by introducing irrelative information with the target. These results can give us insight into building more efficient regularization algorithm to improve the performance of neural network model.

Network as Regularization for Training Deep Neural Networks: Framework, Model and Performance

SparseConnect: Regularising CNNs on Fully Connected Layers

Neighborhood Region Smoothing Regularization for Finding Flat Minima in Deep Neural Networks

Neighborhood Region Smoothing Regularization for Finding Flat Minima In Deep Neural Networks

Adaptive Regularization of Labels

An Improving Framework of regularization for Network Compression

Regularizing Deep Convolutional Neural Networks with a Structured Decorrelation Constraint.

Overfitting Remedy by Sparsifying Regularization on Fully-Connected Layers of CNNs.

Subdomain contraction in deep networks for robust representation learning

ACLS: Adaptive and Conditional Label Smoothing for Network Calibration

Regularizing Deep Networks Using Efficient Layerwise Adversarial Training

Temporal Calibrated Regularization for Robust Noisy Label Learning.

Penetrating the influence of regularizations on neural network based on information bottleneck theory

On Connections Between Regularizations for Improving DNN Robustness

Regularization Matters: A Nonparametric Perspective on Overparametrized Neural Network

Consistency of Neural Networks with Regularization

A Partial Regularization Method for Network Compression

On Regularization for Explaining Graph Neural Networks: An Information Theory Perspective

Deep Fuzzy Clustering Network With Matrix Norm Regularization

Learning with Noisy Labels Via Sparse Regularization

Achieving Constraints in Neural Networks: A Stochastic Augmented Lagrangian Approach