Geometry-induced Implicit Regularization in Deep ReLU Neural Networks
Joachim Bona-Pellissier, Fran çois Malgouyres, Fran çois Bachoc
2024-02-13
Abstract:It is well known that neural networks with many more parameters than training
examples do not overfit. Implicit regularization phenomena, which are still not
well understood, occur during optimization and 'good' networks are favored.
Thus the number of parameters is not an adequate measure of complexity if we do
not consider all possible networks but only the 'good' ones. To better
understand which networks are favored during optimization, we study the
geometry of the output set as parameters vary. When the inputs are fixed, we
prove that the dimension of this set changes and that the local dimension,
called batch functional dimension, is almost surely determined by the
activation patterns in the hidden layers. We prove that the batch functional
dimension is invariant to the symmetries of the network parameterization:
neuron permutations and positive rescalings. Empirically, we establish that the
batch functional dimension decreases during optimization. As a consequence,
optimization leads to parameters with low batch functional dimensions. We call
this phenomenon geometry-induced implicit regularization.The batch functional
dimension depends on both the network parameters and inputs. To understand the
impact of the inputs, we study, for fixed parameters, the largest attainable
batch functional dimension when the inputs vary. We prove that this quantity,
called computable full functional dimension, is also invariant to the
symmetries of the network's parameterization, and is determined by the
achievable activation patterns. We also provide a sampling theorem, showing a
fast convergence of the estimation of the computable full functional dimension
for a random input of increasing size. Empirically we find that the computable
full functional dimension remains close to the number of parameters, which is
related to the notion of local identifiability. This differs from the observed
values for the batch functional dimension computed on training inputs and test
inputs. The latter are influenced by geometry-induced implicit regularization.
Artificial Intelligence,Machine Learning,Neural and Evolutionary Computing,Optimization and Control,Statistics Theory