Why ResNet Works? Residuals Generalize
Fengxiang He,Tongliang Liu,Dacheng Tao
DOI: https://doi.org/10.1109/tnnls.2020.2966319
IF: 14.255
2020-12-01
IEEE Transactions on Neural Networks and Learning Systems
Abstract:Residual connections significantly boost the performance of deep neural networks. However, few theoretical results address the influence of residuals on the hypothesis complexity and the generalization ability of deep neural networks. This article studies the influence of residual connections on the hypothesis complexity of the neural network in terms of the covering number of its hypothesis space. We first present an upper bound of the covering number of networks with residual connections. This bound shares a similar structure with that of neural networks without residual connections. This result suggests that moving a weight matrix or nonlinear activation from the bone to a vine would not increase the hypothesis space. Afterward, an $mathcal O(1 / sqrt {N})$ margin-based multiclass generalization bound is obtained for ResNet, as an exemplary case of any deep neural network with residual connections. Generalization guarantees for similar state-of-the-art neural network architectures, such as DenseNet and ResNeXt, are straightforward. According to the obtained generalization bound, we should introduce regularization terms to control the magnitude of the norms of weight matrices not to increase too much, in practice, to ensure a good generalization ability, which justifies the technique of weight decay.
computer science, artificial intelligence, theory & methods,engineering, electrical & electronic, hardware & architecture