Abstract:Regularization is a very effective algorithm to solve overfitting problem in neural network, which improves the generalization ability of the model. However, their working mechanisms and the impact on the model performance have not been fully explored. In this paper, we study and analyze them using information bottleneck theory and one theory from human brain sensory system. We propose a metric to characterise the encoding length of hidden layers, named as AEntry value. Then, we implement extensive experiments on MNIST and FashionMNIST datasets with several commonly used regularization algorithms, and calculate the corresponding AEntry values. We analyze these results and obtain three conclusions. (1) The introduction of regularization influences the encoding of relative features with prediction task in neural network. The early stopping technique avoids introducing unrelated information with the task into the model by stopping the training process as an appropriate iterations. Laplace, Gaussian and Sparse Response regularizations compress the related representation and improve the performance of neural network by introducing the prior information into the model. In contrast, Dropout, Batch Normalization, and Layer Normalization increase the encoding length of features by adopting redundant representation to improve the performance. (2) The encoding of neural network does not satisfy the data processing inequality of information theory, which is mainly caused by redundant coding of extracted features. (3) The overfitting is caused by introducing irrelative information with the target. These results can give us insight into building more efficient regularization algorithm to improve the performance of neural network model.

Effective Neural Network $L_0$ Regularization With BinMask

Learning Sparse Neural Networks through L0 Regularization

Effective Sparsification of Neural Networks with Global Sparsity Constraint

Evaluating Model Robustness Using Adaptive Sparse L0 Regularization

Elastic Net with Adaptive Weight for Image Denoising

Training Compact DNNs with l 1 / 2 Regularization

Gradient Mask: Lateral Inhibition Mechanism Improves Performance in Artificial Neural Networks

Neural Network for a Class of Sparse Optimization with L0-regularization.

Sparse-Input Neural Network using Group Concave Regularization

L0 Regularization Based Neural Network Design and Compression

Improve Generalization and Robustness of Neural Networks via Weight Scale Shifting Invariant Regularizations

A Hidden Feature Selection Method Based on L2, 0-Norm Regularization for Training Single-hidden-layer Neural Networks.

Penetrating the influence of regularizations on neural network based on information bottleneck theory

Training a neural netwok for data reduction and better generalization

The Role of Regularization in Shaping Weight and Node Pruning Dependency and Dynamics

Dep-$L_0$: Improving $L_0$-based Network Sparsification via Dependency Modeling

$L_0$-ARM: Network Sparsification via Stochastic Binary Optimization

Learning Broad Learning System with Controllable Sparsity Through L0 Regularization

Efficient Construction of Sparse Radial Basis Function Neural Networks Using L-1-Regularization

A Simple Neural Network for Sparse Optimization with $l_1$ Regularization

SPARSE DEEP NEURAL NETWORKS USING <i>L</i><sub>1,</sub>-WEIGHT NORMALIZATION