Abstract:<p>Because of their effectiveness in broad practical applications, LSTM networks have received a wealth of coverage in scientific journals, technical blogs, and implementation guides. However, in most articles, the inference formulas for the LSTM network and its parent, RNN, are stated axiomatically, while the training formulas are omitted altogether. In addition, the technique of "unrolling" an RNN is routinely presented without justification throughout the literature. The goal of this tutorial is to explain the essential RNN and LSTM fundamentals in a single document. Drawing from concepts in Signal Processing, we formally derive the canonical RNN formulation from differential equations. We then propose and prove a precise statement, which yields the RNN unrolling technique. We also review the difficulties with training the standard RNN and address them by transforming the RNN into the "Vanilla LSTM"<a class="workspace-trigger" href="#fn1"><sup>1</sup></a> network through a series of logical arguments. We provide all equations pertaining to the LSTM system together with detailed descriptions of its constituent entities. Albeit unconventional, our choice of notation and the method for presenting the LSTM system emphasizes ease of understanding. As part of the analysis, we identify new opportunities to enrich the LSTM system and incorporate these extensions into the Vanilla LSTM network, producing the most general LSTM variant to date. The target reader has already been exposed to RNNs and LSTM networks through numerous available resources and is open to an alternative pedagogical approach. A Machine Learning practitioner seeking guidance for implementing our new augmented LSTM model in software for experimentation and research will find the insights and derivations in this treatise valuable as well.</p>

Training recurrent neural networks

Residual Recurrent Neural Networks for Learning Sequential Representations.

A Critical Review of Recurrent Neural Networks for Sequence Learning

Learning The Sequential Temporal Information with Recurrent Neural Networks

The recurrent temporal restricted boltzmann machine

Sequence Classification Restricted Boltzmann Machines With Gated Units

Temporal-kernel recurrent neural networks

Recurrently Controlled Recurrent Networks

Linear-Time Sequence Classification using Restricted Boltzmann Machines

Recurrent Neural Networks (RNNs): A gentle Introduction and Overview

Recognizing recurrent neural networks (rRNN): Bayesian inference for recurrent neural networks

Riemannian metrics for neural networks II: recurrent networks and learning symbolic data sequences

Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network

Sequence modeling: recurrent and recursive nets

Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations

Learning to execute

PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive Learning

Encoding Sensory and Motor Patterns as Time-Invariant Trajectories in Recurrent Neural Networks

Reversible Recurrent Neural Networks

An empirical exploration of recurrent network architectures