Abstract:Inner products of neural network feature maps arise in a wide variety of machine learning frameworks as a method of modeling relations between inputs. This work studies the approximation properties of inner products of neural networks. It is shown that the inner product of a multi-layer perceptron with itself is a universal approximator for symmetric positive-definite relation functions. In the case of asymmetric relation functions, it is shown that the inner product of two different multi-layer perceptrons is a universal approximator. In both cases, a bound is obtained on the number of neurons required to achieve a given accuracy of approximation. In the symmetric case, the function class can be identified with kernels of reproducing kernel Hilbert spaces, whereas in the asymmetric case the function class can be identified with kernels of reproducing kernel Banach spaces. Finally, these approximation results are applied to analyzing the attention mechanism underlying Transformers, showing that any retrieval mechanism defined by an abstract preorder can be approximated by attention through its inner product relations. This result uses the Debreu representation theorem in economics to represent preference relations in terms of utility functions.

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper aims to study the approximation performance of inner products in neural networks when modeling relational functions between objects. Specifically, the authors explore the following two main issues: 1. **Approximation of Symmetric Positive Definite Relational Functions**: - The authors demonstrate that the inner product of a multi-layer perceptron (MLP) itself can serve as a universal approximator for symmetric positive definite relational functions. - They provide an upper bound on the number of neurons required to achieve a given approximation accuracy and point out that this class of functions can correspond to the kernel functions of Reproducing Kernel Hilbert Spaces (RKHS). 2. **Approximation of Asymmetric Relational Functions**: - The authors further prove that the inner product of two different multi-layer perceptrons can serve as a universal approximator for asymmetric relational functions. - Similarly, they provide an upper bound on the number of neurons required to achieve a given approximation accuracy and point out that this class of functions can correspond to the kernel functions of Reproducing Kernel Banach Spaces (RKBS). 3. **Analysis of Attention Mechanisms**: - The authors apply the above approximation results to analyze the attention mechanism in Transformers, demonstrating that any retrieval mechanism defined by an abstract preorder can be approximated by the attention mechanism through its inner product relations. - This result utilizes the Debreu representation theorem from economics, which represents preference relations as utility functions. ### Main Contributions - **Theoretical Foundation**: The paper establishes the theoretical foundation for the inner product of neural networks in approximating symmetric and asymmetric relational functions, extending the classical universal approximation theory of neural networks. - **Practical Application**: By applying these theoretical results to the attention mechanism, the paper provides a new perspective for understanding the attention mechanism in models such as Transformers, showing that they can effectively capture and process complex relationships between objects. - **Mathematical Tools**: The paper uses mathematical tools such as Mercer's theorem, Reproducing Kernel Hilbert Spaces, and Reproducing Kernel Banach Spaces to provide rigorous mathematical proofs for the approximation performance of neural network inner products. ### Conclusion Through rigorous mathematical analysis, this paper demonstrates the powerful capability of neural network inner products in modeling symmetric and asymmetric relational functions and applies this to the analysis of attention mechanisms, providing important theoretical support for the understanding and design of deep learning models.

Approximation of relation functions and attention mechanisms

Approximation by non-symmetric networks for cross-domain learning

Approximation of RKHS Functionals by Neural Networks

Why Deep Neural Networks for Function Approximation?

Universal Approximation of Multiple Nonlinear Operators by Neural Networks

Function Approximation with Randomly Initialized Neural Networks for Approximate Model Reference Adaptive Control

On the Approximation and Complexity of Deep Neural Networks to Invariant Functions

Deep Network Approximation: Achieving Arbitrary Accuracy with Fixed Number of Neurons

Universal Approximation Theorem for Neural Networks

Exploring the Approximation Capabilities of Multiplicative Neural Networks for Smooth Functions

Universal Approximation Abilities of a Modular Differentiable Neural Network

Neural approximation of Wasserstein distance via a universal architecture for symmetric and factorwise group invariant functions

Optimal Neural Network Approximation for High-Dimensional Continuous Functions

Integral transform and its application to neural network approximation

Approximations of Continuous Functionals by Neural Networks with Application to Dynamic Systems

Approximation Power of Deep Neural Networks: an explanatory mathematical survey

Approximation Bounds by Neural Networks in L Ω P [-4Pt]

On the Expressive Power of Neural Networks

The approximation capabilities of Durrmeyer-type neural network operators

A Theory of Interpretable Approximations

Universal Approximation Using Radial-Basis-Function Networks