Abstract:We establish the fundamental limits in the approximation of Lipschitz functions by deep ReLU neural networks with finite-precision weights. Specifically, three regimes, namely under-, over-, and proper quantization, in terms of minimax approximation error behavior as a function of network weight precision, are identified. This is accomplished by deriving nonasymptotic tight lower and upper bounds on the minimax approximation error. Notably, in the proper-quantization regime, neural networks exhibit memory-optimality in the approximation of Lipschitz functions. Deep networks have an inherent advantage over shallow networks in achieving memory-optimality. We also develop the notion of depth-precision tradeoff, showing that networks with high-precision weights can be converted into functionally equivalent deeper networks with low-precision weights, while preserving memory-optimality. This idea is reminiscent of sigma-delta analog-to-digital conversion, where oversampling rate is traded for resolution in the quantization of signal samples. We improve upon the best-known ReLU network approximation results for Lipschitz functions and describe a refinement of the bit extraction technique which could be of independent general interest.

What problem does this paper attempt to address?

The paper primarily explores the fundamental limitations of ReLU neural networks in approximating Lipschitz functions, particularly when the network weights are represented with finite precision. The authors define three quantization regimes (under-, over-, and proper quantization) and characterize the behavior of the minimax approximation error as a function of network weight precision through non-asymptotic tight upper and lower bounds. Specifically, the main contributions of the paper include: 1. **Conceptual Contributions**: Identification of three distinct quantization regimes based on the relationship between the minimax approximation error and network weight precision. These regimes include: - Under-quantization regime: The minimax error decays exponentially with the increase in the number of bits \( b \). - Proper-quantization regime: The error decays polynomially, and in this regime, the neural network approximates Lipschitz functions in a memory-optimal manner. - Over-quantization regime: The error remains constant. 2. **Technical Contributions**: - **Depth-Precision Trade-off**: Demonstrates how to convert a network with high-precision weights into a functionally equivalent network with greater depth but lower weight precision, while maintaining memory optimality. This idea is analogous to the trade-off between sampling rate and resolution in sigma-delta analog-to-digital conversion. - **Improved Approximation Results**: For 1-Lipschitz functions on the \[0,1\] interval, the paper provides approximation error behavior that surpasses existing results. Specifically, for sufficiently large network width \( W \) and depth \( L \), and with weight magnitudes bounded by 1, the paper presents a minimax approximation error of \( C(W^2L^2\log(W))^{-1} \), where \( C \) is an absolute constant. - **Improved Bit Extraction Technique**: Proposes an improved bit extraction technique for recovering binary strings from real numbers. The new construction method relies only on polynomial size in network width, whereas traditional methods require weight magnitudes to grow exponentially with network depth. Through these contributions, the paper not only advances our theoretical understanding of ReLU neural networks but also provides practical guidelines for designing efficient network architectures, especially when dealing with finite precision weights.

Three Quantization Regimes for ReLU Networks

On the Universal Approximability and Complexity Bounds of Quantized ReLU Neural Networks

Covering Numbers for Deep ReLU Networks with Applications to Function Approximation and Nonparametric Regression

Optimal Approximation Rates for Deep ReLU Neural Networks on Sobolev and Besov Spaces

Quantified advantage of discontinuous weight selection in approximations with deep neural networks

Towards Lower Bounds on the Depth of ReLU Neural Networks

Optimal function approximation with ReLU neural networks

Neural networks with ReLU powers need less depth

Approximation in $L^p(μ)$ with deep ReLU neural networks

On Expressive Power of Quantized Neural Networks under Fixed-Point Arithmetic

Optimization of ReLU Neural Networks using Quotient Stochastic Gradient Descent

Optimal Rates of Approximation by Shallow ReLU Neural Networks and Applications to Nonparametric Regression

Optimal rates of approximation by shallow ReLU$^k$ neural networks and applications to nonparametric regression

ReLU neural network approximation to piecewise constant functions

On the optimal approximation of Sobolev and Besov functions using deep ReLU neural networks

Residual Quantization for Low Bit-Width Neural Networks.

Error bounds for approximations with deep ReLU neural networks in $W^{s,p}$ norms

Expressive Power of ReLU and Step Networks under Floating-Point Operations

Iterative Deep Neural Network Quantization with Lipschitz Constraint

Upper and lower bounds for the Lipschitz constant of random neural networks

Deep Neural Networks with ReLU-Sine-Exponential Activations Break Curse of Dimensionality in Approximation on Hölder Class.