Abstract:Recurrent Neural Network (RNN) and its variations such as Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU), have become standard building blocks for learning online data of sequential nature in many research areas, including natural language processing and speech data analysis. In this paper, we present a new methodology to significantly reduce the number of parameters in RNNs while maintaining performance that is comparable or even better than classical RNNs. The new proposal, referred to as Restricted Recurrent Neural Network (RRNN), restricts the weight matrices corresponding to the input data and hidden states at each time step to share a large proportion of parameters. The new architecture can be regarded as a compression of its classical counterpart, but it does not require pre-training or sophisticated parameter fine-tuning, both of which are major issues in most existing compression techniques. Experiments on natural language modeling show that compared with its classical counterpart, the restricted recurrent architecture generally produces comparable results at about 50\% compression rate. In particular, the Restricted LSTM can outperform classical RNN with even less number of parameters.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to reduce the number of parameters in Recurrent Neural Networks (RNN) and their variants (such as Long Short - Term Memory, LSTM and Gated Recurrent Unit, GRU), while maintaining or improving the performance of these models. Specifically, the paper proposes a new architecture - Restricted Recurrent Neural Networks (RRNN), which significantly reduces the number of parameters in the model by sharing a large number of parameters between the input data and the hidden state. This method not only reduces the complexity of the model, but also avoids the pre - training and complex parameter fine - tuning problems common in existing compression techniques. Experimental results show that compared with traditional RNN, RRNN can produce comparable results at a compression rate of about 50%. In particular, in the restricted LSTM, it can outperform traditional RNN even with fewer parameters. ### Main contributions of the paper: 1. **Proposing a new model compression technique**: This technique specifically utilizes the recursive structure of RNN and, unlike traditional model compression techniques, does not require retraining of pre - trained models. 2. **Explicit control of the compression rate**: Unlike deep compression methods based on regularization, this method can directly associate the regularization parameter with the exact compression rate. 3. **Compatibility with existing regularization techniques**: Since only the structure of the weight matrix is changed, it is compatible with existing regularization techniques (such as Dropout). 4. **Experimental results verifying effectiveness**: Experimental results show that parameter sharing can not only reduce model complexity, but also improve model performance at a small compression rate. ### Method overview: - **Restricted Recurrent Neural Networks (RRNN)**: By sharing some parameters between the input and the hidden state, the number of parameters in the model is reduced. Specifically, assuming that the input and the hidden state are related to a certain extent, this correlation can be captured by sharing parameters, while sufficient degrees of freedom are assigned to each input. - **General RRNN extended to LSTM and GRU**: The same parameter - sharing strategy can be applied to LSTM and GRU to reduce the number of parameters by forcing parameter sharing in the gating mechanism. ### Experimental results: - **Experiments on the Penn Treebank and WikiText - 2 datasets**: Experimental results show that RRNN can maintain good performance at different sharing rates, especially when the compression rate is high, the performance improvement is more significant. - **Comparison with existing models**: Compared with existing compression models, RRNN can achieve or even exceed the performance of traditional RNN while maintaining a low number of parameters. ### Conclusion: The paper proposes a new model compression method - Restricted Recurrent Neural Networks (RRNN). By sharing parameters between the input and the hidden state, it effectively reduces the number of parameters in RNN, while maintaining or improving the performance of the model. This method not only simplifies the training process, but also is applicable to various RNN variants and has broad application prospects.

Restricted Recurrent Neural Networks

Residual Recurrent Neural Networks for Learning Sequential Representations.

DRRNets: Dynamic Recurrent Routing Via Low-Rank Regularization in Recurrent Neural Networks.

Reversible Recurrent Neural Networks

Parameter Compression of Recurrent Neural Networks and Degradation of Short-term Memory

Compressing Recurrent Neural Network with Tensor Train

Recurrently Controlled Recurrent Networks

MinimalRNN: Toward More Interpretable and Trainable Recurrent Neural Networks

Learning Compact Recurrent Neural Networks with Block-Term Tensor Decomposition

Compressing Recurrent Neural Network Models Through Principal Component Analysis

Were RNNs All We Needed?

Compressing Recurrent Neural Networks with Tensor Ring for Action Recognition

Simplified Gating in Long Short-term Memory (LSTM) Recurrent Neural Networks

LightRNN: Memory and Computation-Efficient Recurrent Neural Networks

GhostRNN: Reducing State Redundancy in RNN with Cheap Operations

On extended long short-term memory and dependent bidirectional recurrent neural network

Non-normal Recurrent Neural Network (nnRNN): learning long time dependencies while improving expressivity with transient dynamics

Hierarchically Gated Recurrent Neural Network for Sequence Modeling

Gates Are Not What You Need in RNNs

Accelerating Recurrent Neural Networks: A Memory-Efficient Approach

Block-Sparse Recurrent Neural Networks