Abstract:A key attribute that drives the unprecedented success of modern Recurrent Neural Networks (RNNs) on learning tasks which involve sequential data, is their ability to model intricate long-term temporal dependencies. However, a well established measure of RNNs long-term memory capacity is lacking, and thus formal understanding of the effect of depth on their ability to correlate data throughout time is limited. Specifically, existing depth efficiency results on convolutional networks do not suffice in order to account for the success of deep RNNs on data of varying lengths. In order to address this, we introduce a measure of the network's ability to support information flow across time, referred to as the Start-End separation rank, which reflects the distance of the function realized by the recurrent network from modeling no dependency between the beginning and end of the input sequence. We prove that deep recurrent networks support Start-End separation ranks which are combinatorially higher than those supported by their shallow counterparts. Thus, we establish that depth brings forth an overwhelming advantage in the ability of recurrent networks to model long-term dependencies, and provide an exemplar of quantifying this key attribute which may be readily extended to other RNN architectures of interest, e.g. variants of LSTM networks. We obtain our results by considering a class of recurrent networks referred to as Recurrent Arithmetic Circuits, which merge the hidden state with the input via the Multiplicative Integration operation, and empirically demonstrate the discussed phenomena on common RNNs. Finally, we employ the tool of quantum Tensor Networks to gain additional graphic insight regarding the complexity brought forth by depth in recurrent networks.

The Uncanny Similarity of Recurrence and Depth

Depth Enables Long-Term Memory for Recurrent Neural Networks

On the Long-Term Memory of Deep Recurrent Networks

Recurrence along Depth: Deep Convolutional Neural Networks with Recurrent Layer Aggregation

Recurrent issues with deep neural networks of visual recognition

Deep Networks with Stochastic Depth

Evidence that recurrent circuits are critical to the ventral stream’s execution of core object recognition behavior

R-LKDepth: Recurrent Depth Learning With Larger Kernel

Rethinking the Relationship between Recurrent and Non-Recurrent Neural Networks: A Study in Sparsity

Growing Deep Neural Network Considering with Similarity between Neurons

Multiresolution Transformer Networks: Recurrence is Not Essential for Modeling Hierarchical Structure

Make Deep Networks Shallow Again

Recurrent Feedback Improves Recognition of Partially Occluded Objects

On the Practical Ability of Recurrent Neural Networks to Recognize Hierarchical Languages

Depth Selection for Deep ReLU Nets in Feature Extraction and Generalization

Don't Forget The Past: Recurrent Depth Estimation from Monocular Video

Unified field theoretical approach to deep and recurrent neuronal networks

Making Neural Programming Architectures Generalize via Recursion

Recurrent networks improve neural response prediction and provide insights into underlying cortical circuits

The Role of Recurrency in Image Segmentation for Noisy and Limited Sample Settings

Neural Networks with a Redundant Representation: Detecting the Undetectable