How many samples are needed to train a deep neural network?

Pegah Golestaneh,Mahsa Taheri,Johannes Lederer
2024-05-27
Abstract:Neural networks have become standard tools in many areas, yet many important statistical questions remain open. This paper studies the question of how much data are needed to train a ReLU feed-forward neural network. Our theoretical and empirical results suggest that the generalization error of ReLU feed-forward neural networks scales at the rate $1/\sqrt{n}$ in the sample size $n$ rather than the usual "parametric rate" $1/n$. Thus, broadly speaking, our results underpin the common belief that neural networks need "many" training samples.
Statistics Theory,Machine Learning
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to explore the number of samples required to train deep neural networks, especially ReLU feed - forward neural networks. Specifically, the author attempts to answer the following questions: 1. **How many samples are required to train a ReLU feed - forward neural network?** - Through theoretical and empirical research, the paper finds that the generalization error of the ReLU feed - forward neural network changes with the number of samples \(n\) at a rate of \(1/\sqrt{n}\), rather than the usual parametrization rate of \(1/n\). This indicates that deep neural networks do indeed require "a large number of" training samples. 2. **How to establish the lower bound of the minimax risk for deep ReLU feed - forward neural networks?** - The author uses the Fano inequality technique in information theory to establish the lower bound of the minimax risk of the ReLU feed - forward neural network. This lower bound grows logarithmically with the input dimension \(d\) and is inversely proportional to the number of samples \(n\), that is, \(\sqrt{\frac{\log(d)}{n}}\). 3. **Empirical research to verify and support this theoretical result** - The author not only provides theoretical proofs but also further verifies this conclusion through empirical research, covering regression and classification tasks. ### Main contributions 1. **Establishing the lower bound of the minimax risk of the ReLU feed - forward neural network** - This lower bound depends on the logarithmic factor of the input dimension and the sparsity level of the parameter space, and decreases at a rate of \(1/\sqrt{n}\) as the number of training samples \(n\) increases. This result matches the recent upper - bound results. 2. **Providing empirical support** - In addition to theoretical proofs, the author further verifies this conclusion through experimental data, including various network structures such as convolutional neural networks (CNN). 3. **Extracting the lower bound of the packing number of deep ReLU networks for the first time** - This result is proposed for the first time in the literature, providing a new perspective for understanding the complexity and capacity of deep neural networks. ### Broader impacts - **Implications in practice** - Most practitioners believe that deep learning requires more data than classical methods. The accuracy of classical linear regression methods (such as least - squares estimation) usually increases with the number of samples at a rate of \(1/n\), while the mathematical proofs and numerical results in this paper show that the convergence rate of non - linear deep learning is \(1/\sqrt{n}\), which theoretically confirms the intuition of practitioners. - **Complementing the engineering perspective** - This paper explores an important aspect of deep learning from a strict mathematical perspective, complementing the rich "engineering perspective" in the field. In conclusion, through theoretical and empirical research, this paper systematically answers the question of the number of samples required to train deep neural networks and provides important theoretical and practical guidance.