The Kolmogorov-Arnold representation theorem revisited

Johannes Schmidt-Hieber
DOI: https://doi.org/10.48550/arXiv.2007.15884
2021-01-03
Abstract:There is a longstanding debate whether the Kolmogorov-Arnold representation theorem can explain the use of more than one hidden layer in neural networks. The Kolmogorov-Arnold representation decomposes a multivariate function into an interior and an outer function and therefore has indeed a similar structure as a neural network with two hidden layers. But there are distinctive differences. One of the main obstacles is that the outer function depends on the represented function and can be wildly varying even if the represented function is smooth. We derive modifications of the Kolmogorov-Arnold representation that transfer smoothness properties of the represented function to the outer function and can be well approximated by ReLU networks. It appears that instead of two hidden layers, a more natural interpretation of the Kolmogorov-Arnold representation is that of a deep neural network where most of the layers are required to approximate the interior function.
Machine Learning,Neural and Evolutionary Computing
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is about the applicability and relevance of the Kolmogorov - Arnold Representation Theorem (KA Representation Theorem) in explaining the structure of multi - layer neural networks. Specifically, the paper explores whether the KA Representation Theorem can reasonably explain why it is beneficial to use more than one hidden layer in neural networks. The paper points out that although the KA Representation Theorem shows that a multivariate function can be represented by a specific network with two hidden layers, this explanation is controversial. One of the main obstacles is that the outer function depends on the function being represented, and the outer function may change drastically even if the function being represented is smooth. To overcome these limitations, the author derives some modified versions of the KA Representation Theorem. These modified versions can transfer the smoothness properties of the function being represented to the outer function and can be well - approximated by ReLU networks. The author believes that rather than interpreting the KA Representation Theorem as a neural network with two hidden layers, it is more natural to interpret it as a deep neural network, where most layers are used to approximate the inner function. ### Main contributions of the paper 1. **Modified versions of the KA Representation Theorem**: - New versions of the KA Representation Theorem are proposed, which are not only easy to prove but also can transfer the smoothness properties of multivariate functions to the outer function. - These modified versions make the connection between the KA Representation Theorem and deep ReLU networks closer. 2. **Construction of deep ReLU networks**: - Based on the modified KA Representation Theorem, a deep ReLU network is constructed, which is optimal in terms of the number of parameters. - By transforming the approximation problem of multivariate functions into the approximation problem of univariate functions, the number of parameters is reduced and the efficiency of the network is improved. 3. **Theoretical analysis**: - The relationship between the KA Representation Theorem and space - filling curves is analyzed in detail. - It is proved that under certain conditions, the KA Representation Theorem can be transformed into the form of a deep ReLU network, and this transformation does not lose the approximation rate. ### Formula summary - **Classical form of the KA Representation Theorem**: \[ f(x_1, \ldots, x_d)=\sum_{q = 0}^{2d}g_q\left(\sum_{p = 1}^d\psi_{p,q}(x_p)\right) \] - **Improved KA Representation Theorem**: \[ f(x_1, \ldots, x_d)=\sum_{q = 0}^{2d}g\left(\sum_{p = 1}^d b_p\psi(x_p+q a)+c_q\right) \] - **Approximation error of smooth functions**: \[ \|f-\tilde{f}\|_p\leqslant2\left(Q+\|f\|_\infty\right)2^{-\beta K} \] ### Conclusion This paper provides a new perspective to understand the structure and function of multi - layer neural networks by improving the KA Representation Theorem. These improvements not only strengthen the connection between the KA Representation Theorem and deep ReLU networks but also provide a theoretical basis for designing more efficient neural networks.