TASI Lectures on Physics for Machine Learning

Jim Halverson
2024-08-01
Abstract:These notes are based on lectures I gave at TASI 2024 on Physics for Machine Learning. The focus is on neural network theory, organized according to network expressivity, statistics, and dynamics. I present classic results such as the universal approximation theorem and neural network / Gaussian process correspondence, and also more recent results such as the neural tangent kernel, feature learning with the maximal update parameterization, and Kolmogorov-Arnold networks. The exposition on neural network theory emphasizes a field theoretic perspective familiar to theoretical physicists. I elaborate on connections between the two, including a neural network approach to field theory.
High Energy Physics - Theory,Machine Learning,High Energy Physics - Phenomenology
What problem does this paper attempt to address?
The paper attempts to address the problem of understanding the theoretical foundations of neural networks in the field of machine learning (ML). Specifically, the paper focuses on the following aspects: 1. **Expressivity**: - Investigates which types of functions neural networks can represent or approximate. For example, through the Universal Approximation Theorem (UAT), the paper demonstrates that a single-layer neural network can approximate any continuous function. - Explores the Kolmogorov-Arnold theorem, showing that multivariable continuous functions can be represented as combinations of one-dimensional functions. 2. **Statistics**: - Analyzes the statistical properties of neural networks at initialization, particularly the distribution of parameters and their impact on network output. - Proposes the correspondence between neural networks and Gaussian processes (NNGP), i.e., in the case of infinite width, neural networks can be viewed as samples from a Gaussian process. - Discusses non-Gaussian process scenarios, where neural networks may exhibit interactions when the central limit theorem assumptions are violated. 3. **Dynamics**: - Explores the dynamic behavior of neural network parameters over time, including the trajectory of parameter changes during training. - Introduces concepts such as the Neural Tangent Kernel to study the dynamic changes of neural networks during the training process. Through these studies, the paper aims to provide a deeper understanding of neural networks and to lay a theoretical foundation for future research. Additionally, the paper explores the connection between neural networks and physics, particularly from the perspective of field theory, to understand and analyze the behavior of neural networks.