Abstract:Kolmogorov and Arnold, in answering Hilbert's 13th problem (in the context of continuous functions), laid the foundations for the modern theory of Neural Networks (NNs). Their proof divides the representation of a multivariate function into two steps: The first (non-linear) inter-layer map gives a universal embedding of the data manifold into a single hidden layer whose image is patterned in such a way that a subsequent dynamic can then be defined to solve for the second inter-layer map. I interpret this pattern as "minor concentration" of the almost everywhere defined Jacobians of the interlayer map. Minor concentration amounts to sparsity for higher exterior powers of the Jacobians. We present a conceptual argument for how such sparsity may set the stage for the emergence of successively higher order concepts in today's deep NNs and suggest two classes of experiments to test this hypothesis.

What problem does this paper attempt to address?

The problem this paper attempts to address is: how to use the proof of the Kolmogorov-Arnold Theorem (KA) to better understand the learning mechanisms of modern Neural Networks (NNs). Specifically, the authors believe: 1. **Background of the KA Theorem**: Kolmogorov and Arnold laid the foundation for modern neural network theory while answering Hilbert's 13th problem. Their proof divides the representation of multivariable functions into two steps: first, embedding the data manifold into a hidden layer through a nonlinear mapping, and then defining the subsequent dynamic process to solve the second intermediate layer mapping. 2. **Key Concepts**: The authors interpret this embedding pattern as "minor concentration," which refers to the sparsity of the Jacobian matrix of higher-order exterior powers. This sparsity may lay the foundation for the emergence of higher-order concepts in modern deep neural networks. 3. **Experimental Suggestions**: The authors propose two types of experiments to verify this hypothesis: - **Searching for minor concentration in natural Jacobian mappings**: Observing the flow of the data manifold in the neural network during training to find the phenomenon of minor concentration in the Jacobian matrix. - **Evaluating the impact of enforced minor concentration on learning**: Studying the effect on learning by alternately using traditional training protocols and new objective functions (measuring minor concentration) during training. 4. **Theoretical Significance**: The authors believe that the proof of the KA theorem is more important than its statement because the proof reveals the underlying mechanisms in neural network training. By understanding and applying these mechanisms, the training process of neural networks can be better optimized. In summary, this paper aims to explore the insights of the KA theorem's proof on the learning mechanisms of modern neural networks and proposes specific experimental methods to verify these theoretical hypotheses.

The Proof of Kolmogorov-Arnold May Illuminate Neural Network Learning

The Kolmogorov-Arnold representation theorem revisited

KAT to KANs: A Review of Kolmogorov-Arnold Networks and the Neural Leap Forward

A Survey on Kolmogorov-Arnold Network

Generalization Bounds and Model Complexity for Kolmogorov-Arnold Networks

KAN: Kolmogorov-Arnold Networks

Construction of the Kolmogorov-Arnold representation using the Newton-Kaczmarz method

Rethinking the Function of Neurons in KANs

Kolmogorov-Arnold Networks in Low-Data Regimes: A Comparative Study with Multilayer Perceptrons

Deep Learning Alternatives of the Kolmogorov Superposition Theorem

On the Complexity of Learning Neural Networks

Sinc Kolmogorov-Arnold Network and Its Applications on Physics-informed Neural Networks

When Do Neural Networks Outperform Kernel Methods?

A proof that deep artificial neural networks overcome the curse of dimensionality in the numerical approximation of Kolmogorov partial differential equations with constant diffusion and nonlinear drift coefficients

Demonstrating the Efficacy of Kolmogorov-Arnold Networks in Vision Tasks

The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning

Smooth Kolmogorov Arnold networks enabling structural knowledge representation

On Training of Kolmogorov-Arnold Networks

Provably Bounding Neural Network Preimages

Training Neural Networks Using Reproducing Kernel Space Interpolation and Model Reduction

A theory of data variability in Neural Network Bayesian inference