Abstract:Understanding the inner workings of neural networks, including transformers, remains one of the most challenging puzzles in machine learning. This study introduces a novel approach by applying the principles of gauge symmetries, a key concept in physics, to neural network architectures. By regarding model functions as physical observables, we find that parametric redundancies of various machine learning models can be interpreted as gauge symmetries. We mathematically formulate the parametric redundancies in neural ODEs, and find that their gauge symmetries are given by spacetime diffeomorphisms, which play a fundamental role in Einstein's theory of gravity. Viewing neural ODEs as a continuum version of feedforward neural networks, we show that the parametric redundancies in feedforward neural networks are indeed lifted to diffeomorphisms in neural ODEs. We further extend our analysis to transformer models, finding natural correspondences with neural ODEs and their gauge symmetries. The concept of gauge symmetries sheds light on the complex behavior of deep learning models through physics and provides us with a unifying perspective for analyzing various machine learning architectures.

What problem does this paper attempt to address?

This paper explores the intrinsic symmetry of neural networks, especially Transformers and feedforward neural networks. The authors apply the concept of gauge symmetry from physics to neural network architectures, revealing that the redundancy of model parameters can be interpreted as gauge symmetry of physical observables. They demonstrate that the gauge symmetry of Neural Ordinary Differential Equations (Neural ODEs) arises from spacetime metric transformations, which is a fundamental principle in Einstein's general theory of relativity. Specifically, they show the mathematical relationship between the gauge symmetry of neural ODEs and spacetime metric transformations, and suggest that this symmetry also exists in feedforward neural networks when they are viewed as discrete versions of neural ODEs. The main contributions of this paper include: 1. Treating model functions as physical observables, thereby interpreting the redundancy of neural networks and neural ODEs as gauge symmetry in physics. 2. Revealing the mathematical characteristics of general gauge symmetry in neural ODEs, which are characterized by spacetime metric transformations satisfying specific conditions. 3. Demonstrating that the redundancy of feedforward neural networks is elevated to spacetime metric transformations in neural ODEs, which is a result of viewing neural networks as the continuous limit. 4. Identifying a natural correspondence between Transformers and neural ODEs and showing that an analysis of their gauge symmetry can be parallel to the aforementioned cases. In this way, the paper provides a unified perspective to analyze the internal symmetry of various machine learning architectures and connects these concepts to fundamental principles in physics. This contributes to a deeper understanding of the complex behavior of neural networks and provides new insights for optimizing and trusting these models.

Unification of Symmetries Inside Neural Networks: Transformer, Feedforward and Neural ODE

Physical Symmetries Embedded in Neural Networks

Transformer models are gauge invariant: A mathematical connection between AI and particle physics

Symmetry-regularized neural ordinary differential equations

Complexity from Adaptive-Symmetries Breaking: Global Minima in the Statistical Mechanics of Deep Neural Networks

Gauge Invariant and Anyonic Symmetric Transformer and RNN Quantum States for Quantum Lattice Models

A Unified Framework for Interpretable Transformers Using PDEs and Information Theory

Symmetry Breaking in Neural Network Optimization: Insights from Input Dimension Expansion

Symmetries in Overparametrized Neural Networks: A Mean-Field View

Unifying O(3) Equivariant Neural Networks Design with Tensor-Network Formalism

Uncertainty and Structure in Neural Ordinary Differential Equations

The Empirical Impact of Neural Parameter Symmetries, or Lack Thereof

On the Ability of Deep Networks to Learn Symmetries from Data: A Neural Kernel Theory

Universal Differential Equations as a Common Modeling Language for Neuroscience

Neural Canonical Transformation with Symplectic Flows

Mamba Neural Operator: Who Wins? Transformers vs. State-Space Models for PDEs

Synergy and Symmetry in Deep Learning: Interactions between the Data, Model, and Inference Algorithm

Equivariant Transformer is all you need

Embedding Capabilities of Neural ODEs

Transformers as Neural Operators for Solutions of Differential Equations with Finite Regularity

Transformations establishing equivalence across neural networks: When have two networks learned the same task?