Regularization Matters: A Nonparametric Perspective on Overparametrized Neural Network

Tianyang Hu,Wenjia Wang,Cong Lin,Guang Cheng
DOI: https://doi.org/10.48550/arXiv.2007.02486
2021-09-25
Abstract:Overparametrized neural networks trained by gradient descent (GD) can provably overfit any training data. However, the generalization guarantee may not hold for noisy data. From a nonparametric perspective, this paper studies how well overparametrized neural networks can recover the true target function in the presence of random noises. We establish a lower bound on the $L_2$ estimation error with respect to the GD iterations, which is away from zero without a delicate scheme of early stopping. In turn, through a comprehensive analysis of $\ell_2$-regularized GD trajectories, we prove that for overparametrized one-hidden-layer ReLU neural network with the $\ell_2$ regularization: (1) the output is close to that of the kernel ridge regression with the corresponding neural tangent kernel; (2) minimax {optimal} rate of $L_2$ estimation error can be achieved. Numerical experiments confirm our theory and further demonstrate that the $\ell_2$ regularization approach improves the training robustness and works for a wider range of neural networks.
Machine Learning
What problem does this paper attempt to address?