Abstract:Previous RNN architectures have largely been superseded by LSTM, or "Long Short-Term Memory". Since its introduction, there have been many variations on this simple design. However, it is still widely used and we are not aware of a gated-RNN architecture that outperforms LSTM in a broad sense while still being as simple and efficient. In this paper we propose a modified LSTM-like architecture. Our architecture is still simple and achieves better performance on the tasks that we tested on. We also introduce a new RNN performance benchmark that uses the handwritten digits and stresses several important network capabilities.

What problem does this paper attempt to address?

The problems that this paper attempts to solve are several key issues in the existing LSTM (Long - Short - Term Memory network) architecture in order to improve its performance in handling sequence data tasks. Specifically, the paper points out that LSTM has the following three main problems: 1. **Exponential decay of memory caused by the forget gate**: The forget gate imposes an exponential decay on the memory unit, which may be inappropriate in some cases. For example, when the model needs to maintain certain information for a long time, this exponential decay may prematurely weaken the importance of this information. 2. **Limited information exchange between memory units**: Memory units cannot directly communicate or exchange information unless the input and output gates are opened. This limits the information flow within the memory unit, making it difficult for the model to effectively manage complex internal states. 3. **Saturation problem of the hyperbolic tangent activation function**: LSTM uses hyperbolic tangent ($\tanh$) as an activation function. When the input value is large, the gradient of the $\tanh$ function becomes very small, resulting in the vanishing gradient problem and thus affecting the training effect. To solve these problems, the author proposes an improved LSTM architecture, called LSTM with Working Memory (LSTWM). The main improvements of LSTWM include: - **Replacing the forget gate with a functional layer**: LSTWM introduces a functional layer located between the input gate and the output gate. It combines the current memory unit value with the output of this functional layer through a convex combination, instead of simply multiplying by the output of the forget gate. - **Using a logarithm - based activation function**: LSTWM attempts to use a logarithm - based activation function to avoid the saturation problem of traditional activation functions (such as $\tanh$) under large input values, thereby improving the performance of the model. - **Enhancing information exchange within the memory unit**: By introducing an additional functional layer, LSTWM allows more flexible information exchange between memory units without relying on the on - off states of the input and output gates. To verify the effectiveness of these improvements, the author conducted experiments on multiple tasks, including text prediction tasks and a task that combines number recognition and addition. The experimental results show that LSTWM exhibits better performance on these tasks, especially when using the logarithm - based activation function. In summary, the main objective of this paper is to overcome the limitations of LSTM in handling long - term dependencies and complex sequence data by improving the LSTM architecture, thereby improving the performance and efficiency of the model.

LSTM with Working Memory

Working Memory Connections for LSTM

Long short-term memory based on a reward/punishment strategy for recurrent neural networks

xLSTM: Extended Long Short-Term Memory

Simplified Gating in Long Short-term Memory (LSTM) Recurrent Neural Networks

Learning to Forget: Continual Prediction with LSTM

Novel Architecture for Long Short-Term Memory Used in Question Classification

E-LSTM: an Efficient Hardware Architecture for Long Short-Term Memory

Long short-term memory networks in memristor crossbars

Do RNN and LSTM have Long Memory?

A Compact and Configurable Long Short-Term Memory Neural Network Hardware Architecture.

ERA-LSTM: An Efficient ReRAM-Based Architecture for Long Short-Term Memory

LSTM: A Search Space Odyssey

Dynamic temporal residual network for sequence modeling

A Modified Long Short-Term Memory Cell

Long short-term memory networks in memristor crossbar arrays

Efficient Weight Reuse for Large LSTMs.

Long Short-Term Memory Implementation Exploiting Passive RRAM Crossbar Array

A Performance Review of Recurrent Neural Networks Long Short-Term Memory (LSTM)

Memory Visualization for Gated Recurrent Neural Networks in Speech Recognition

A bio-inspired bistable recurrent cell allows for long-lasting memory