Abstract:Recent work has suggested that feedforward residual neural networks (ResNets) approximate iterative recurrent computations. Iterative computations are useful in many domains, so they might provide good solutions for neural networks to learn. However, principled methods for measuring and manipulating iterative convergence in neural networks remain lacking. Here we address this gap by 1) quantifying the degree to which ResNets learn iterative solutions and 2) introducing a regularization approach that encourages the learning of iterative solutions. Iterative methods are characterized by two properties: iteration and convergence. To quantify these properties, we define three indices of iterative convergence. Consistent with previous work, we show that, even though ResNets can express iterative solutions, they do not learn them when trained conventionally on computer-vision tasks. We then introduce regularizations to encourage iterative convergent computation and test whether this provides a useful inductive bias. To make the networks more iterative, we manipulate the degree of weight sharing across layers using soft gradient coupling. This new method provides a form of recurrence regularization and can interpolate smoothly between an ordinary ResNet and a “recurrent” ResNet (i.e., one that uses identical weights across layers and thus could be physically implemented with a recurrent network computing the successive stages iteratively across time). To make the networks more convergent we impose a Lipschitz constraint on the residual functions using spectral normalization. The three indices of iterative convergence reveal that the gradient coupling and the Lipschitz constraint succeed at making the networks iterative and convergent, respectively. To showcase the practicality of our approach, we study how iterative convergence impacts generalization on standard visual recognition tasks (MNIST, CIFAR-10, CIFAR-100) or challenging recognition tasks with partial occlusions (Digitclutter). We find that iterative convergent computation, in these tasks, does not provide a useful inductive bias for ResNets. Importantly, our approach may be useful for investigating other network architectures and tasks as well and we hope that our study provides a useful starting point for investigating the broader question of whether iterative convergence can help neural networks in their generalization.

Convergent Learning: Do different neural networks learn the same representations?

Towards Understanding Learning Representations: to What Extent Do Different Neural Networks Learn the Same Representation.

To What Extent Do Different Neural Networks Learn the Same Representation: A Neuron Activation Subspace Match Approach

Bridging the Semantic Latent Space Between Brain and Machine: Similarity is All You Need

Learning Representation for Multiple Biological Networks Via a Robust Graph Regularized Integration Approach

Validating Siamese embedded neural networks with identical representations for efficient model convergence

A New Concept of Multiple Neural Networks Structure Using Convex Combination

When and where do feed-forward neural networks learn localist representations?

On Privileged and Convergent Bases in Neural Network Representations

No One Representation to Rule Them All: Overlapping Features of Training Methods

Can neural networks benefit from objectives that encourage iterative convergent computations? A case study of ResNets and object classification

Attributing Learned Concepts in Neural Networks to Training Data

Globally Convergent Neural Networks

You Only Learn One Representation: Unified Network for Multiple Tasks

Intra-Model Collaborative Learning of Neural Networks.

Do Convnets Learn Correspondence?

Universality of representation in biological and artificial neural networks

Empirical Studies on the Convergence of Feature Spaces in Deep Learning

Analysis of the Convergency of Topology Preserving Neural Networks on Learning.

Evaluating alignment between humans and neural network representations in image-based learning tasks

A FEATURE EMBEDDING STRATEGY FOR HIGH-LEVEL CNN REPRESENTATIONS FROM MULTIPLE CONVNETS