Abstract:Recurrent Neural Network (RNN) has been successfully applied in many sequence learning problems. Such as handwriting recognition, image description, natural language processing and video motion analysis. After years of development, researchers have improved the internal structure of the RNN and introduced many variants. Among others, Gated Recurrent Unit (GRU) is one of the most widely used RNN model. However, GRU lacks the capability of adaptively paying attention to certain regions or locations, so that it may cause information redundancy or loss during leaning. In this paper, we propose a RNN model, called Recurrent Attention Unit (RAU), which seamlessly integrates the attention mechanism into the interior of GRU by adding an attention gate. The attention gate can enhance GRU's ability to remember long-term memory and help memory cells quickly discard unimportant content. RAU is capable of extracting information from the sequential data by adaptively selecting a sequence of regions or locations and pay more attention to the selected regions during learning. Extensive experiments on image classification, sentiment classification and language modeling show that RAU consistently outperforms GRU and other baseline methods.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem that the existing Gated Recurrent Unit (GRU) lacks the ability to adaptively focus on specific regions or positions when processing sequence data. Specifically: 1. **Information Redundancy and Loss**: Although the traditional GRU model can remember long - term information, it cannot adaptively focus on certain important parts, which may lead to information redundancy or loss during the learning process. 2. **Enhancing the Ability of Long - term Memory and Quickly Discarding Unimportant Information**: To overcome this problem, the author proposes a new RNN architecture - **Recurrent Attention Unit (RAU)**, which enhances its capabilities by seamlessly integrating the attention mechanism within the GRU. 3. **Improving the Efficiency and Accuracy of Sequence Data Processing**: RAU can more effectively extract key information and ignore unimportant content when processing tasks such as image classification, sentiment classification, and language modeling, thereby improving the performance of the model. ### Main contributions of the paper 1. **Proposing a new RNN architecture RAU**: Experimental results show that RAU significantly outperforms LSTM and GRU models on multiple tasks. 2. **Simplifying the model structure**: Usually, the attention mechanism is connected to the original RNN as an additional layer, while RAU seamlessly adds the attention gate to the memory unit of the GRU, making the model simpler and easier to train. 3. **Wide Applicability**: RAU is not only applicable to the field of computer vision, but can also be applied to all sequence - related problems, such as image classification, language modeling, and sentiment classification tasks. ### Formula Representation - **Update Gate**: \[ z_t=\sigma\left(W_z\left[x_t, h_{t - 1}\right]+b_z\right) \] - **Reset Gate**: \[ r_t=\sigma\left(W_r\left[x_t, h_{t - 1}\right]+b_r\right) \] - **Candidate Value**: \[ \tilde{h}_t = \tanh\left(W_{\tilde{h}}\left[x_t, r_t h_{t - 1}\right]+b_{\tilde{h}}\right) \] - **Hidden State Update**: \[ h_t=(1 - z_t)h_{t - 1}+z_t\tilde{h}_t/2+z_t\hat{h}_t/2 \] where $\sigma$ represents the sigmoid function, $\tanh$ represents the hyperbolic tangent function, and $W$ and $b$ represent the weight matrix and bias term respectively. Through these improvements, RAU can more effectively capture important information while reducing the impact of redundant information when processing complex sequence data.

Recurrent Attention Unit

Recurrent attention unit: A new gated recurrent unit for long-term memory of important parts in sequential data

Recurrent Attention Unit - A Simple and Effective Method for Traffic Prediction.

Recurrent Neural Unit with Frequency Attention for Specific Emitter Identification

Efficiently applying attention to sequential data with the Recurrent Discounted Attention unit

Residual Recurrent Neural Networks for Learning Sequential Representations.

Attention Recurrent Neural Networks for Image-Based Sequence Text Recognition.

A Novel Design for a Gated Recurrent Network with Attentional Memories

Adding Attentiveness to the Neurons in Recurrent Neural Networks

EleAtt-RNN: Adding Attentiveness to Neurons in Recurrent Neural Networks

Attention with Long-Term Interval-Based Gated Recurrent Units for Modeling Sequential User Behaviors

Adaptive Attention-Aware Gated Recurrent Unit for Sequential Recommendation

Attention-Based Recurrent Neural Network for Sequence Labeling.

Combining Gated Recurrent Unit and Attention Pooling for Sentimental Classification.

Gates Are Not What You Need in RNNs

Attention as an RNN

Minimal Gated Unit for Recurrent Neural Networks

Prototypical Recurrent Unit

Gated recurrent neural networks discover attention

Recurrently Controlled Recurrent Networks

Recurrent Neural Networks with Auxiliary Memory Units