Abstract:We consider functions from the real numbers to the real numbers, output by a neural network with 1 hidden activation layer, arbitrary width, and ReLU activation function. We assume that the parameters of the neural network are chosen uniformly at random with respect to various probability distributions, and compute the expected distribution of the points of non-linearity. We use these results to explain why the network may be biased towards outputting functions with simpler geometry, and why certain functions with low information-theoretic complexity are nonetheless hard for a neural network to approximate.

What problem does this paper attempt to address?

The paper primarily explores the issue of nonlinear point distribution in the generation functions of random neural networks and attempts to explain why certain functions with lower information-theoretic complexity are still difficult for neural networks to approximate. Specifically, the paper considers a special class of neural networks that contain a single hidden layer, use ReLU activation functions, and have parameters (weights and biases) that are uniformly randomly selected. The focus of the study is on analyzing the number and distribution of the nonlinear points (i.e., points where the function behavior changes) of the functions output by these networks. Through theoretical analysis, the authors find that the distribution of these nonlinear points is closely related to the selection of the parameter space. The main contributions of the paper include: 1. **Rectangular Parameter Space**: When weights and biases are uniformly randomly selected from a finite interval, the number of nonlinear points of the function follows a binomial distribution. Additionally, the paper derives the expected number of nonlinear points under different circumstances. 2. **Gaussian Parameter Space**: When weights and biases are selected from a normal distribution, the probability distribution of the number of nonlinear points is similar to that in the rectangular parameter space, but the coefficients in the probability formula differ. 3. **Spherical Parameter Space**: In this case, biases are still uniformly randomly selected from a finite interval, while weights are uniformly randomly selected from within a sphere. For finite domains, the paper provides an expression for the expected number of nonlinear points and discusses its asymptotic behavior. Through these results, the paper further explains why some functions, despite having low information-theoretic complexity, may still be difficult for neural networks to approximate. For example, a periodic sawtooth wave function, although low in information-theoretic complexity, is relatively difficult for neural networks using ReLU activation functions to learn. In summary, this study reveals the potential biases of neural networks when dealing with specific types of functions and provides a theoretical basis for further understanding the learning capabilities of neural networks.

Points of non-linearity of functions generated by random neural networks

On the Complexity of Learning Neural Networks

Random ReLU Neural Networks as Non-Gaussian Processes

Random Vector Functional Link Networks for Function Approximation on Manifolds

Randomness in Neural Networks: an Overview

Generating Random Parameters in Feedforward Neural Networks with Random Hidden Nodes: Drawbacks of the Standard Method and How to Improve It

Emergence of Globally Attracting Fixed Points in Deep Neural Networks With Nonlinear Activations

Proportional infinite-width infinite-depth limit for deep linear neural networks

Deformed semicircle law and concentration of nonlinear random matrices for ultra-wide neural networks

Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks

Multi-layer random features and the approximation power of neural networks

A Novel Explanation Against Linear Neural Networks

Random feature neural networks learn Black-Scholes type PDEs without curse of dimensionality

Neural Redshift: Random Networks are not Random Functions

Linear approximability of two-layer neural networks: A comprehensive analysis based on spectral decay

Function-Space Optimality of Neural Architectures With Multivariate Nonlinearities

Over-parametrized neural networks as under-determined linear systems

Randomly Initialized One-Layer Neural Networks Make Data Linearly Separable

Exponential Expressivity of ReLU$^k$ Neural Networks on Gevrey Classes with Point Singularities

Nonlinearity Enhanced Adaptive Activation Function

Complex Critical Points of Deep Linear Neural Networks