Abstract:There is a longstanding debate whether the Kolmogorov-Arnold representation theorem can explain the use of more than one hidden layer in neural networks. The Kolmogorov-Arnold representation decomposes a multivariate function into an interior and an outer function and therefore has indeed a similar structure as a neural network with two hidden layers. But there are distinctive differences. One of the main obstacles is that the outer function depends on the represented function and can be wildly varying even if the represented function is smooth. We derive modifications of the Kolmogorov-Arnold representation that transfer smoothness properties of the represented function to the outer function and can be well approximated by ReLU networks. It appears that instead of two hidden layers, a more natural interpretation of the Kolmogorov-Arnold representation is that of a deep neural network where most of the layers are required to approximate the interior function.

What problem does this paper attempt to address?

The core problem that this paper attempts to solve is about the applicability and relevance of the Kolmogorov - Arnold Representation Theorem (KA Representation Theorem) in explaining the structure of multi - layer neural networks. Specifically, the paper explores whether the KA Representation Theorem can reasonably explain why it is beneficial to use more than one hidden layer in neural networks. The paper points out that although the KA Representation Theorem shows that a multivariate function can be represented by a specific network with two hidden layers, this explanation is controversial. One of the main obstacles is that the outer function depends on the function being represented, and the outer function may change drastically even if the function being represented is smooth. To overcome these limitations, the author derives some modified versions of the KA Representation Theorem. These modified versions can transfer the smoothness properties of the function being represented to the outer function and can be well - approximated by ReLU networks. The author believes that rather than interpreting the KA Representation Theorem as a neural network with two hidden layers, it is more natural to interpret it as a deep neural network, where most layers are used to approximate the inner function. ### Main contributions of the paper 1. **Modified versions of the KA Representation Theorem**: - New versions of the KA Representation Theorem are proposed, which are not only easy to prove but also can transfer the smoothness properties of multivariate functions to the outer function. - These modified versions make the connection between the KA Representation Theorem and deep ReLU networks closer. 2. **Construction of deep ReLU networks**: - Based on the modified KA Representation Theorem, a deep ReLU network is constructed, which is optimal in terms of the number of parameters. - By transforming the approximation problem of multivariate functions into the approximation problem of univariate functions, the number of parameters is reduced and the efficiency of the network is improved. 3. **Theoretical analysis**: - The relationship between the KA Representation Theorem and space - filling curves is analyzed in detail. - It is proved that under certain conditions, the KA Representation Theorem can be transformed into the form of a deep ReLU network, and this transformation does not lose the approximation rate. ### Formula summary - **Classical form of the KA Representation Theorem**: \[ f(x_1, \ldots, x_d)=\sum_{q = 0}^{2d}g_q\left(\sum_{p = 1}^d\psi_{p,q}(x_p)\right) \] - **Improved KA Representation Theorem**: \[ f(x_1, \ldots, x_d)=\sum_{q = 0}^{2d}g\left(\sum_{p = 1}^d b_p\psi(x_p+q a)+c_q\right) \] - **Approximation error of smooth functions**: \[ \|f-\tilde{f}\|_p\leqslant2\left(Q+\|f\|_\infty\right)2^{-\beta K} \] ### Conclusion This paper provides a new perspective to understand the structure and function of multi - layer neural networks by improving the KA Representation Theorem. These improvements not only strengthen the connection between the KA Representation Theorem and deep ReLU networks but also provide a theoretical basis for designing more efficient neural networks.

The Kolmogorov-Arnold representation theorem revisited

The Proof of Kolmogorov-Arnold May Illuminate Neural Network Learning

Error bounds for deep ReLU networks using the Kolmogorov--Arnold superposition theorem

Construction of the Kolmogorov-Arnold representation using the Newton-Kaczmarz method

KAT to KANs: A Review of Kolmogorov-Arnold Networks and the Neural Leap Forward

Generalization Bounds and Model Complexity for Kolmogorov-Arnold Networks

A VERSION OF KOLMOGOROV-ARNOLD REPRESENTATION THEOREM FOR DIFFERENTIABLE FUNCTIONS OF SEVERAL VARIABLES

Deep Learning Alternatives of the Kolmogorov Superposition Theorem

Smooth Kolmogorov Arnold networks enabling structural knowledge representation

P1-KAN an effective Kolmogorov Arnold Network for function approximation

Chebyshev Polynomial-Based Kolmogorov-Arnold Networks: An Efficient Architecture for Nonlinear Function Approximation

Sinc Kolmogorov-Arnold Network and Its Applications on Physics-informed Neural Networks

A Survey on Kolmogorov-Arnold Network

KAN: Kolmogorov-Arnold Networks

On the Expressive Power of Neural Networks

Rethinking the Function of Neurons in KANs

Linear approximability of two-layer neural networks: A comprehensive analysis based on spectral decay

When Representations Align: Universality in Representation Learning Dynamics

Formation of Representations in Neural Networks

Kolmogorov-Arnold Networks in Low-Data Regimes: A Comparative Study with Multilayer Perceptrons

On the Kolmogorov neural networks