Abstract:We present a formal measure-theoretical theory of neural networks (NN) built on probability coupling theory. Our main contributions are summarized as follows. * Built on the formalism of probability coupling theory, we derive an algorithm framework, named Hierarchical Measure Group and Approximate System (HMGAS), nicknamed S-System, that is designed to learn the complex hierarchical, statistical dependency in the physical world. * We show that NNs are special cases of S-System when the probability kernels assume certain exponential family distributions. Activation Functions are derived formally. We further endow geometry on NNs through information geometry, show that intermediate feature spaces of NNs are stochastic manifolds, and prove that "distance" between samples is contracted as layers stack up. * S-System shows NNs are inherently stochastic, and under a set of realistic boundedness and diversity conditions, it enables us to prove that for large size nonlinear deep NNs with a class of losses, including the hinge loss, all local minima are global minima with zero loss errors, and regions around the minima are flat basins where all eigenvalues of Hessians are concentrated around zero, using tools and ideas from mean field theory, random matrix theory, and nonlinear operator equations. * S-System, the information-geometry structure and the optimization behaviors combined completes the analog between Renormalization Group (RG) and NNs. It shows that a NN is a complex adaptive system that estimates the statistic dependency of microscopic object, e.g., pixels, in multiple scales. Unlike clear-cut physical quantity produced by RG in physics, e.g., temperature, NNs renormalize/recompose manifolds emerging through learning/optimization that divide the sample space into highly semantically meaningful groups that are dictated by supervised labels (in supervised NNs).

Learning Curves for Deep Neural Networks: A Gaussian Field Theory Perspective

Learning curves for deep structured Gaussian feature models

Learning Curves for Analysis of Deep Networks

Learning curves for deep structured Gaussian feature models *

Mapping the Learning Curves of Deep Learning Networks

An Analytical Theory of Curriculum Learning in Teacher-Student Networks

A Mathematical Principle of Deep Learning: Learn the Geodesic Curve in the Wasserstein Space

A Study of the Mathematics of Deep Learning

Measure, Manifold, Learning, and Optimization: A Theory Of Neural Networks

Deep Neural Networks via Complex Network Theory: a Perspective

Understanding Training and Generalization in Deep Learning by Fourier Analysis.

Gaussian Universality in Neural Network Dynamics with Generalized Structured Input Distributions

An exact mapping between the Variational Renormalization Group and Deep Learning

High-dimensional learning of narrow neural networks

Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges

Simplified derivations for high-dimensional convex learning problems

A theory of data variability in Neural Network Bayesian inference

Deep Neural Networks as Gaussian Processes

PDE Models for Deep Neural Networks: Learning Theory, Calculus of Variations and Optimal Control

How Graph Neural Networks Learn: Lessons from Training Dynamics

Learning the gravitational force law and other analytic functions