Abstract:We introduce a Gaussian Process (GP) generalization of ResNets (with unknown functions of the network replaced by GPs and identified via MAP estimation), which includes ResNets (trained with L2 regularization on weights and biases) as a particular case (when employing particular kernels). We show that ResNets (and their warping GP regression extension) converge, in the infinite depth limit, to a generalization of image registration variational algorithms. In this generalization, images are replaced by functions mapping input/output spaces to a space of unexpressed abstractions (ideas), and material points are replaced by data points. Whereas computational anatomy aligns images via warping of the material space, this generalization aligns ideas (or abstract shapes as in Plato's theory of forms) via the warping of the Reproducing Kernel Hilbert Space (RKHS) of functions mapping the input space to the output space. While the Hamiltonian interpretation of ResNets is not new, it was based on an Ansatz. We do not rely on this Ansatz and present the first rigorous proof of convergence of ResNets with trained weights and biases towards a Hamiltonian dynamics driven flow. Since our proof is constructive and based on discrete and continuous mechanics, it reveals several remarkable properties of ResNets and their GP generalization. ResNets regressors are kernel regressors with data-dependent warping kernels. Minimizers of L2 regularized ResNets satisfy a discrete least action principle implying the near preservation of the norm of weights and biases across layers. The trained weights of ResNets with scaled/strong L2 regularization can be identified by solving an autonomous Hamiltonian system. The trained ResNet parameters are unique up to (a function of) the initial momentum, and the initial momentum representation of those parameters is generally sparse. The kernel (nugget) regularization strategy provides a provably robust alternative to Dropout for ANNs. We introduce a functional generalization of GPs and show that pointwise GP/RKHS error estimates lead to probabilistic and deterministic generalization error estimates for ResNets. When performed with feature maps, the proposed analysis identifies the (EPDiff) mean fields limit of trained ResNet parameters as the number of data points goes to infinity. The search for good architectures can be reduced to that of good kernels, and we show that the composition of warping regression blocks with reduced equivariant multichannel kernels (introduced here) recovers and generalizes CNNs to arbitrary spaces and groups of transformations.

Regularization, early-stopping and dreaming: A Hopfield-like setup to address generalization and overfitting

Regularization, early-stopping and dreaming: a Hopfield-like setup to address generalization and overfitting

Hebbian dreaming for small datasets

Typhus in Calcutta

Eigenvector Dreaming

Daydreaming Hopfield Networks and their surprising effectiveness on correlated data

Simultaneous embedding of multiple attractor manifolds in a recurrent neural network using constrained gradient optimization

Regularization Matters: A Nonparametric Perspective on Overparametrized Neural Network

Do ideas have shape? Idea registration as the continuous limit of artificial neural networks

Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel

Brain memory working. Optimal control behavior for improved Hopfield-like models

Early Stopping of Untrained Convolutional Neural Networks

Regularization theory in the study of generalization ability of a biological neural network model

Noise Stability Optimization for Finding Flat Minima: A Hessian-based Regularization Approach

A new mechanical approach to handle generalized Hopfield neural networks

Regularization and Reparameterization Avoid Vanishing Gradients in Sigmoid-Type Networks

Hopfield model with planted patterns: a teacher-student self-supervised learning model

Hessian Regularization of Deep Neural Networks: A Novel Approach Based on Stochastic Estimators of Hessian Trace.

Hebbian Learning from First Principles

On the ISS Property of the Gradient Flow for Single Hidden-Layer Neural Networks with Linear Activations