Investigating Sparsity in Recurrent Neural Networks

Harshil Darji
DOI: https://doi.org/10.13140/RG.2.2.30539.20004
2024-07-30
Abstract:In the past few years, neural networks have evolved from simple Feedforward Neural Networks to more complex neural networks, such as Convolutional Neural Networks and Recurrent Neural Networks. Where CNNs are a perfect fit for tasks where the sequence is not important such as image recognition, RNNs are useful when order is important such as machine translation. An increasing number of layers in a neural network is one way to improve its performance, but it also increases its complexity making it much more time and power-consuming to train. One way to tackle this problem is to introduce sparsity in the architecture of the neural network. Pruning is one of the many methods to make a neural network architecture sparse by clipping out weights below a certain threshold while keeping the performance near to the original. Another way is to generate arbitrary structures using random graphs and embed them between an input and output layer of an Artificial Neural Network. Many researchers in past years have focused on pruning mainly CNNs, while hardly any research is done for the same in RNNs. The same also holds in creating sparse architectures for RNNs by generating and embedding arbitrary structures. Therefore, this thesis focuses on investigating the effects of the before-mentioned two techniques on the performance of RNNs. We first describe the pruning of RNNs, its impact on the performance of RNNs, and the number of training epochs required to regain accuracy after the pruning is performed. Next, we continue with the creation and training of Sparse Recurrent Neural Networks and identify the relation between the performance and the graph properties of its underlying arbitrary structure. We perform these experiments on RNN with Tanh nonlinearity (RNN-Tanh), RNN with ReLU nonlinearity (RNN-ReLU), GRU, and LSTM. Finally, we analyze and discuss the results achieved from both the experiments.
Machine Learning
What problem does this paper attempt to address?
This paper primarily explores the issue of sparsity in Recurrent Neural Networks (RNNs) and achieves network structure sparsification through two methods: weight pruning and sparse network structures based on random graph generation. ### 1. Weight Pruning The paper first investigates the impact of weight pruning on the accuracy of Recurrent Neural Networks. Specifically, it explores: - Pruning weights from input-to-hidden and hidden-to-hidden layers simultaneously. - Pruning weights from input-to-hidden layers alone. - Pruning weights from hidden-to-hidden layers alone. By pruning different proportions of weights from trained models and retraining these pruned models, the study examines how many training epochs are required to recover accuracy. This helps determine the maximum acceptable pruning ratio and the number of training epochs needed to restore accuracy after a significant drop. ### 2. Randomly Structured Recurrent Neural Networks Additionally, the paper analyzes the performance of randomly structured Recurrent Neural Networks. This method involves generating random structures and embedding them into artificial neural networks to create Sparse-RNNs. These random structures are produced by random graph generators and then embedded between the input and output layers. The experiments evaluate the relationship between the performance of randomly structured recurrent networks and the graph properties of their internal structures. ### Summary The main objective of the paper is to explore the impact of sparse structures on the performance of Recurrent Neural Networks and their variants (such as Long Short-Term Memory networks (LSTMs) and Gated Recurrent Units (GRUs)) through the aforementioned methods. By addressing a series of research questions, including the impact of pruning on accuracy, acceptable pruning ratios, the number of training epochs required to recover accuracy, and the correlation between the performance of randomly structured networks and their internal graph properties, the paper aims to provide a theoretical foundation and empirical evidence for understanding how to effectively reduce the complexity of Recurrent Neural Networks and improve their efficiency.