What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to explore the number of samples required to train deep neural networks, especially ReLU feed - forward neural networks. Specifically, the author attempts to answer the following questions: 1. **How many samples are required to train a ReLU feed - forward neural network?** - Through theoretical and empirical research, the paper finds that the generalization error of the ReLU feed - forward neural network changes with the number of samples \(n\) at a rate of \(1/\sqrt{n}\), rather than the usual parametrization rate of \(1/n\). This indicates that deep neural networks do indeed require "a large number of" training samples. 2. **How to establish the lower bound of the minimax risk for deep ReLU feed - forward neural networks?** - The author uses the Fano inequality technique in information theory to establish the lower bound of the minimax risk of the ReLU feed - forward neural network. This lower bound grows logarithmically with the input dimension \(d\) and is inversely proportional to the number of samples \(n\), that is, \(\sqrt{\frac{\log(d)}{n}}\). 3. **Empirical research to verify and support this theoretical result** - The author not only provides theoretical proofs but also further verifies this conclusion through empirical research, covering regression and classification tasks. ### Main contributions 1. **Establishing the lower bound of the minimax risk of the ReLU feed - forward neural network** - This lower bound depends on the logarithmic factor of the input dimension and the sparsity level of the parameter space, and decreases at a rate of \(1/\sqrt{n}\) as the number of training samples \(n\) increases. This result matches the recent upper - bound results. 2. **Providing empirical support** - In addition to theoretical proofs, the author further verifies this conclusion through experimental data, including various network structures such as convolutional neural networks (CNN). 3. **Extracting the lower bound of the packing number of deep ReLU networks for the first time** - This result is proposed for the first time in the literature, providing a new perspective for understanding the complexity and capacity of deep neural networks. ### Broader impacts - **Implications in practice** - Most practitioners believe that deep learning requires more data than classical methods. The accuracy of classical linear regression methods (such as least - squares estimation) usually increases with the number of samples at a rate of \(1/n\), while the mathematical proofs and numerical results in this paper show that the convergence rate of non - linear deep learning is \(1/\sqrt{n}\), which theoretically confirms the intuition of practitioners. - **Complementing the engineering perspective** - This paper explores an important aspect of deep learning from a strict mathematical perspective, complementing the rich "engineering perspective" in the field. In conclusion, through theoretical and empirical research, this paper systematically answers the question of the number of samples required to train deep neural networks and provides important theoretical and practical guidance.

How many samples are needed to train a deep neural network?

How Many Samples are Needed to Estimate a Convolutional or Recurrent Neural Network?

A model is worth tens of thousands of examples

Realization of spatial sparseness by deep ReLU nets with massive data

Nonparametric regression using deep neural networks with ReLU activation function

On Size-Independent Sample Complexity of ReLU Networks

Understanding deep learning requires rethinking generalization

How many Neurons do we need? A refined Analysis for Shallow Networks trained with Gradient Descent

Neural networks with ReLU powers need less depth

Sample Variance Decay in Randomly Initialized ReLU Networks

Understanding deep learning (still) requires rethinking generalization

The sampling complexity of learning invertible residual neural networks

Limitations of neural network training due to numerical instability of backpropagation

Scaling description of generalization with number of parameters in deep learning

Neural Scaling Laws of Deep ReLU and Deep Operator Network: A Theoretical Study

Student Specialization in Deep ReLU Networks With Finite Width and Input Dimension

Sampling Complexity of Deep Approximation Spaces

Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron

Should Under-parameterized Student Networks Copy or Average Teacher Weights?

A Convergence Theory Towards Practical Over-parameterized Deep Neural Networks