Abstract:Regularization is a very effective algorithm to solve overfitting problem in neural network, which improves the generalization ability of the model. However, their working mechanisms and the impact on the model performance have not been fully explored. In this paper, we study and analyze them using information bottleneck theory and one theory from human brain sensory system. We propose a metric to characterise the encoding length of hidden layers, named as AEntry value. Then, we implement extensive experiments on MNIST and FashionMNIST datasets with several commonly used regularization algorithms, and calculate the corresponding AEntry values. We analyze these results and obtain three conclusions. (1) The introduction of regularization influences the encoding of relative features with prediction task in neural network. The early stopping technique avoids introducing unrelated information with the task into the model by stopping the training process as an appropriate iterations. Laplace, Gaussian and Sparse Response regularizations compress the related representation and improve the performance of neural network by introducing the prior information into the model. In contrast, Dropout, Batch Normalization, and Layer Normalization increase the encoding length of features by adopting redundant representation to improve the performance. (2) The encoding of neural network does not satisfy the data processing inequality of information theory, which is mainly caused by redundant coding of extracted features. (3) The overfitting is caused by introducing irrelative information with the target. These results can give us insight into building more efficient regularization algorithm to improve the performance of neural network model.

The Efficacy of Regularization in Two Layer Neural Networks

$\Ell _1$ Regularization in Two-Layer Neural Networks.

Nonasymptotic theory for two-layer neural networks: Beyond the bias-variance trade-off

Regularization theory in the study of generalization ability of a biological neural network model

Rethinking Bias-Variance Trade-off for Generalization of Neural Networks

Improve Generalization and Robustness of Neural Networks via Weight Scale Shifting Invariant Regularizations

Consistency of Neural Networks with Regularization

Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel

Penetrating the influence of regularizations on neural network based on information bottleneck theory

Regularization-wise double descent: Why it occurs and how to eliminate it

Deep Network Regularization via Bayesian Inference of Synaptic Connectivity

Neuron with Steady Response Leads to Better Generalization

Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks

Generalization Error Analysis of Neural networks with Gradient Based Regularization

How (Implicit) Regularization of ReLU Neural Networks Characterizes the Learned Function -- Part II: the Multi-D Case of Two Layers with Random First Layer

Regularization Matters: A Nonparametric Perspective on Overparametrized Neural Network

Generalization for Least Squares Regression With Simple Spiked Covariances

Regularization for Adversarial Robust Learning

On the Geometry of Regularization in Adversarial Training: High-Dimensional Asymptotics and Generalization Bounds

Theory IIIb: Generalization in Deep Networks

On the Generalization Error Bounds of Neural Networks under Diversity-Inducing Mutual Angular Regularization