Abstract:Recurrent neural networks (RNNs) are widely used as a memory model for sequence-related problems. Many variants of RNN have been proposed to solve the gradient problems of training RNNs and process long sequences. Although some classical models have been proposed, capturing long-term dependence while responding to short-term changes remains a challenge. To address this problem, we propose a new model named Dual Recurrent Neural Networks (DuRNN). The DuRNN consists of two parts to learn the short-term dependence and progressively learn the long-term dependence. The first part is a recurrent neural network with constrained full recurrent connections to deal with short-term dependence in sequence and generate short-term memory. Another part is a recurrent neural network with independent recurrent connections which helps to learn long-term dependence and generate long-term memory. A selection mechanism is added between two parts to transfer the needed long-term information to the independent neurons. Multiple modules can be stacked to form a multi-layer model for better performance. Our contributions are: 1) a new recurrent model developed based on the divide-and-conquer strategy to learn long and short-term dependence separately, and 2) a selection mechanism to enhance the separating and learning of different temporal scales of dependence. Both theoretical analysis and extensive experiments are conducted to validate the performance of our model. Experimental results indicate that the proposed DuRNN model can handle not only very long sequences (over 5,000 time steps), but also short sequences very well.

Subtraction Gates: Another Way to Learn Long-Term Dependencies in Recurrent Neural Networks

Residual Recurrent Neural Networks for Learning Sequential Representations.

Deep Gate Recurrent Neural Network

Learning various length dependence by dual recurrent neural networks

Learning Longer Memory in Recurrent Neural Networks

Hierarchically Gated Recurrent Neural Network for Sequence Modeling

Memory-Gated Recurrent Networks

Simplified Gating in Long Short-term Memory (LSTM) Recurrent Neural Networks

Gates Are Not What You Need in RNNs

Refined Gate: A Simple and Effective Gating Mechanism for Recurrent Units

Learning Over Long Time Lags

Delayed Memory Unit: Modelling Temporal Dependency Through Delay Gate

Highway State Gating for Recurrent Highway Networks: improving information flow through time

Learning Long Term Dependencies Via Fourier Recurrent Units.

Learning to Forget: Continual Prediction with LSTM

On Training Recurrent Networks with Truncated Backpropagation Through Time in Speech Recognition

Occam's Gates

Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions

Learning long-term dependencies with gradient descent is difficult

ELSTM: An improved long short‐term memory network language model for sequence learning

On extended long short-term memory and dependent bidirectional recurrent neural network