Stability and convergence theory for learning resnet: A full characterization

Huishuai Zhang, Da Yu, Mingyang Yi, Wei Chen, Tie-yan Liu
2019-01-01
Abstract:ResNet structure has achieved great success since its debut. In this paper, we study the stability of learning ResNet. Specifically, we consider the ResNet block where is ReLU activation and is a scalar. We show that for standard initialization used in practice, is a sharp value in characterizing the stability of forward/backward process of ResNet, where is the number of residual blocks. Specifically, stability is guaranteed for while conversely forward process explodes when for a positive constant . Moreover, if ResNet is properly over-parameterized, we show for gradient descent is guaranteed to find the global minima \footnote{We use to hide logarithmic factor.}, which significantly enlarges the range of that admits global convergence in previous work. We also demonstrate that the over-parameterization requirement of ResNet only weakly depends on the depth, which corroborates the advantage of ResNet over vanilla feedforward network. Empirically, with , deep ResNet can be easily trained even without normalization layer. Moreover, adding can also improve the performance of ResNet with normalization layer.
What problem does this paper attempt to address?