A theory of data variability in Neural Network Bayesian inference

Javed Lindner,David Dahmen,Michael Krämer,Moritz Helias

2023-11-09

Abstract:Bayesian inference and kernel methods are well established in machine learning. The neural network Gaussian process in particular provides a concept to investigate neural networks in the limit of infinitely wide hidden layers by using kernel and inference methods. Here we build upon this limit and provide a field-theoretic formalism which covers the generalization properties of infinitely wide networks. We systematically compute generalization properties of linear, non-linear, and deep non-linear networks for kernel matrices with heterogeneous entries. In contrast to currently employed spectral methods we derive the generalization properties from the statistical properties of the input, elucidating the interplay of input dimensionality, size of the training data set, and variability of the data. We show that data variability leads to a non-Gaussian action reminiscent of a ($\varphi^3+\varphi^4$)-theory. Using our formalism on a synthetic task and on MNIST we obtain a homogeneous kernel matrix approximation for the learning curve as well as corrections due to data variability which allow the estimation of the generalization properties and exact results for the bounds of the learning curves in the case of infinitely many training data points.

Disordered Systems and Neural Networks,Machine Learning

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how the generalization performance of infinitely - wide neural networks is affected by the variability of training data within the Bayesian inference framework. Specifically, the authors focus on the specific impact of data variability on generalization performance in the kernel limit of infinitely - wide linear and nonlinear networks. By introducing the method of statistical field theory, they systematically study this impact and derive approximate expressions for the predictive distribution. These expressions not only reveal the interaction among data dimension, the size of the training data set and data variability, but also show how data variability leads to non - Gaussian processes, similar to the $\phi^3+\phi^4$ theory in field theory. In addition, the authors use their method to conduct experiments on synthetic tasks and the MNIST data set, verify the validity of the theoretical results, and are able to estimate the generalization performance, especially the boundaries of the learning curve in the case of an infinite number of training data points. In short, the core problem of this paper is to explore and quantify the impact of data variability on the generalization ability of infinitely - wide neural networks.

A theory of data variability in Neural Network Bayesian inference

Towards a General Theory of Infinite-Width Limits of Neural Classifiers

Deep Kernel Posterior Learning under Infinite Variance Prior Weights

Central Limit Theorem for Bayesian Neural Network trained with Variational Inference

Critical feature learning in deep neural networks

A statistical mechanics framework for Bayesian deep neural networks beyond the infinite-width limit

A theory of representation learning gives a deep generalisation of kernel methods

Posterior Inference on Shallow Infinitely Wide Bayesian Neural Networks under Weights with Unbounded Variance

Deep Neural Networks as Gaussian Processes

Towards a Statistical Understanding of Neural Networks: Beyond the Neural Tangent Kernel Theories

Dynamics of finite width Kernel and prediction fluctuations in mean field neural networks *

Bayesian inference with finitely wide neural networks

Dynamics of Finite Width Kernel and Prediction Fluctuations in Mean Field Neural Networks

Generalization in Kernel Regression Under Realistic Assumptions

Dynamically Stable Infinite-Width Limits of Neural Classifiers

Variational Inference for Nonlinear Inverse Problems via Neural Net Kernels: Comparison to Bayesian Neural Networks, Application to Topology Optimization

Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks

Theory IIIb: Generalization in Deep Networks

Bayesian Inference with Deep Weakly Nonlinear Networks

On the Benefits of Invariance in Neural Networks

Learning Curves for Deep Neural Networks: A Gaussian Field Theory Perspective