On Tighter Generalization Bound for Deep Neural Networks: CNNs, ResNets, and Beyond

Xingguo Li,Junwei Lu,Zhaoran Wang,Jarvis Haupt,Tuo Zhao
DOI: https://doi.org/10.48550/arXiv.1806.05159
2019-07-04
Abstract:We establish a margin based data dependent generalization error bound for a general family of deep neural networks in terms of the depth and width, as well as the Jacobian of the networks. Through introducing a new characterization of the Lipschitz properties of neural network family, we achieve significantly tighter generalization bounds than existing results. Moreover, we show that the generalization bound can be further improved for bounded losses. Aside from the general feedforward deep neural networks, our results can be applied to derive new bounds for popular architectures, including convolutional neural networks (CNNs) and residual networks (ResNets). When achieving same generalization errors with previous arts, our bounds allow for the choice of larger parameter spaces of weight matrices, inducing potentially stronger expressive ability for neural networks. Numerical evaluation is also provided to support our theory.
Machine Learning
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are as follows: 1. **Establish tighter generalization error bounds**: The author aims to establish margin - based, data - dependent generalization error bounds for deep neural networks (DNNs). These bounds take into account not only the depth and width of the network, but also the Jacobian matrix of the network. By introducing a new Lipschitz property characterization, the author is able to significantly tighten the existing generalization bounds. - **Specific problems**: - (Q1) Can tighter generalization error bounds be established for deep neural networks based on network dimensions and weight matrix structures? - (Q2) Can generalization bounds be developed for neural networks with special architectures such as convolutional neural networks (CNNs) and residual networks (ResNets)? 2. **Improve existing results**: For (Q1), the author points out that existing research has established generalization bounds based on the depth \( D \) and width \( p \) of the network and the norm of the weight matrix with rank \( r \). However, these bounds may be too loose, especially when depending on the Frobenius norm or the mixed norm. For example, \[ \|W_d\|_F \] and \[ \|W_d\|_{2,1} \] are usually \(\sqrt{r}\) times larger than the spectral norm \[ \|W_d\|_2 \]. Therefore, the author proposes a new generalization error bound \(\tilde{O}\left(\|\text{Jacobian}\|_2 \sqrt{\frac{Dpr}{m}}\right)\) based on the Jacobian matrix, which is tighter than the existing results. 3. **Generalization bounds for special architectures**: For (Q2), the author considers two widely - used architectures - convolutional neural networks (CNNs) and residual networks (ResNets), and provides a compact characterization of their capacity characteristics. In particular, by considering orthogonal filters and normalized weight matrices, the author shows the good performance of these networks in optimization and generalization. 4. **Extension to width - varying operations**: The author also considers some common width - expanding and - reducing operations (such as padding and pooling), and proves that these operations do not increase the generalization bounds. ### Summary The core objective of this paper is to provide tighter generalization error bounds for deep neural networks, and these bounds are applicable not only to general feed - forward networks, but also to specific architectures such as CNNs and ResNets. By introducing a new Lipschitz analysis method, the author is able to verify theoretically and numerically that the proposed bounds are indeed tighter than the existing results. This helps to explain the great success of deep neural networks in practical applications and provides a theoretical basis for further improving network performance.