Abstract:In our work, we bridge deep neural network design with numerical differential equations. We show that many effective networks, such as ResNet, PolyNet, FractalNet and RevNet, can be interpreted as different numerical discretizations of differential equations. This finding brings us a brand new perspective on the design of effective deep architectures. We can take advantage of the rich knowledge in numerical analysis to guide us in designing new and potentially more effective deep networks. As an example, we propose a linear multi-step architecture (LM-architecture) which is inspired by the linear multi-step method solving ordinary differential equations. The LM-architecture is an effective structure that can be used on any ResNet-like networks. In particular, we demonstrate that LM-ResNet and LM-ResNeXt (i.e. the networks obtained by applying the LM-architecture on ResNet and ResNeXt respectively) can achieve noticeably higher accuracy than ResNet and ResNeXt on both CIFAR and ImageNet with comparable numbers of trainable parameters. In particular, on both CIFAR and ImageNet, LM-ResNet/LM-ResNeXt can significantly compress ($>50$\%) the original networks while maintaining a similar performance. This can be explained mathematically using the concept of modified equation from numerical analysis. Last but not least, we also establish a connection between stochastic control and noise injection in the training process which helps to improve generalization of the networks. Furthermore, by relating stochastic training strategy with stochastic dynamic system, we can easily apply stochastic training to the networks with the LM-architecture. As an example, we introduced stochastic depth to LM-ResNet and achieve significant improvement over the original LM-ResNet on CIFAR10.

Deep Limits of Residual Neural Networks

Implicit regularization of deep residual networks towards neural ODEs

Neural ODEs as the deep limit of ResNets with constant weights

Convergence Analysis of Deep Residual Networks

Do Residual Neural Networks discretize Neural Ordinary Differential Equations?

Generalization bounds for neural ordinary differential equations and deep residual networks

Neural Generalized Ordinary Differential Equations with Layer-varying Parameters

Mean-Field and Kinetic Descriptions of Neural Differential Equations

Theory IIIb: Generalization in Deep Networks

Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations

Universal Approximation Power of Deep Residual Neural Networks via Nonlinear Control Theory

Generalization of Scaled Deep ResNets in the Mean-Field Regime

Variational formulations of ODE-Net as a mean-field optimal control problem and existence results

Infinite‐width limit of deep linear neural networks

Towards an Understanding of Residual Networks Using Neural Tangent Hierarchy (NTH)

Deep Residual Learning for Nonlinear Regression

Error estimates of residual minimization using neural networks for linear PDEs

Algorithm-Dependent Generalization Bounds for Overparameterized Deep Residual Networks

Deep linear networks for regression are implicitly regularized towards flat minima

Scaling ResNets in the Large-depth Regime

Modeling from Features: a Mean-field Framework for Over-parameterized Deep Neural Networks