Training of deep residual networks with stochastic MG/OPT

Cyrill von Planta,Alena Kopanicakova,Rolf Krause
DOI: https://doi.org/10.48550/arXiv.2108.04052
IF: 5.414
2021-08-09
Machine Learning
Abstract:We train deep residual networks with a stochastic variant of the nonlinear multigrid method MG/OPT. To build the multilevel hierarchy, we use the dynamical systems viewpoint specific to residual networks. We report significant speed-ups and additional robustness for training MNIST on deep residual networks. Our numerical experiments also indicate that multilevel training can be used as a pruning technique, as many of the auxiliary networks have accuracies comparable to the original network.
What problem does this paper attempt to address?