Abstract:A common technique for ameliorating the computational costs of running large neural models is sparsification, or the removal of neural connections during training. Sparse models are capable of maintaining the high accuracy of state of the art models, while functioning at the cost of more parsimonious models. The structures which underlie sparse architectures are, however, poorly understood and not consistent between differently trained models and sparsification schemes. In this paper, we propose a new technique for sparsification of recurrent neural nets (RNNs), called moduli regularization, in combination with magnitude pruning. Moduli regularization leverages the dynamical system induced by the recurrent structure to induce a geometric relationship between neurons in the hidden state of the RNN. By making our regularizing term explicitly geometric, we provide the first, to our knowledge, a priori description of the desired sparse architecture of our neural net. We verify the effectiveness of our scheme for navigation and natural language processing RNNs. Navigation is a structurally geometric task, for which there are known moduli spaces, and we show that regularization can be used to reach 90% sparsity while maintaining model performance only when coefficients are chosen in accordance with a suitable moduli space. Natural language processing, however, has no known moduli space in which computations are performed. Nevertheless, we show that moduli regularization induces more stable recurrent neural nets with a variety of moduli regularizers, and achieves high fidelity models at 98% sparsity.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the **sparsification problem in Recurrent Neural Networks (RNNs)**, specifically by introducing a new technique - moduli regularization, combined with magnitude pruning, to achieve a more efficient and stable sparsified model. #### Background and problem description 1. **High computational cost**: Large neural networks have high computational costs during training and inference. To reduce these costs, researchers have attempted to reduce the number of connections in neural networks through sparsification. 2. **Instability of sparse architectures**: Although existing sparsification methods can reduce the number of parameters, the sparse architectures are unstable and are easily affected by the random initialization of weights, leading to a decline in model performance. In addition, the sparse architectures obtained from different training processes and sparsification schemes vary greatly, lacking consistency and interpretability. 3. **Under - utilization of geometric structures**: Recurrent neural networks have potential geometric properties, but these properties are not fully utilized during the sparsification process. In particular, for navigation tasks, there are known specific moduli spaces, but it is not clear whether there are similar geometric structures in Natural Language Processing (NLP) tasks. #### Core contributions of the paper 1. **Introduction of moduli regularization**: The paper proposes a new regularization method - moduli regularization. This method embeds the neurons of the hidden state into a metric space and adjusts the weights according to the geometric distance between neurons, thereby inducing a sparsified network structure. 2. **Combination with magnitude pruning**: The combination of moduli regularization and magnitude pruning can significantly reduce the number of parameters in the model while maintaining model performance. Experiments show that in navigation tasks, moduli regularization can maintain model performance at a 90% sparsification level; in natural language processing tasks, even at 98% sparsification, the model still exhibits high fidelity. 3. **Verification of the stability of sparse architectures**: The paper verifies that the sparse architectures generated by moduli regularization are more stable than traditional methods by retraining the sparsified model. In particular, in navigation tasks, the sparse architectures generated by moduli regularization can still maintain good performance after re - initializing the weights. 4. **Exploration of the influence of geometric structures on sparsification**: The paper explores the influence of different geometric structures (such as tori, Klein bottles, etc.) on the sparsification effect and finds that some geometric structures (such as tori) can better support sparsification, while other structures (such as spheres) have a poorer effect. #### Conclusion By introducing moduli regularization, the paper provides a new sparsification method that can significantly reduce the number of parameters while maintaining model performance and generate more stable sparse architectures. This is of great significance for improving the computational efficiency and stability of Recurrent Neural Networks, especially in resource - constrained environments.

Geometric sparsification in recurrent neural networks

Activity Sparsity Complements Weight Sparsity for Efficient RNN Inference

Universal structural patterns in sparse recurrent neural networks

Weight Sparsity Complements Activity Sparsity in Neuromorphic Language Models

Learning Low-Rank Structured Sparsity in Recurrent Neural Networks

Shaving Weights with Occam's Razor: Bayesian Sparsification for Neural Networks Using the Marginal Likelihood

Investigating Sparsity in Recurrent Neural Networks

Bayesian Sparsification of Recurrent Neural Networks

Effective Model Sparsification by Scheduled Grow-and-Prune Methods

Less is More -- Towards parsimonious multi-task models using structured sparsity

Block-Sparse Recurrent Neural Networks

Training a neural netwok for data reduction and better generalization

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

A Geometric Modeling of Occam's Razor in Deep Learning

A Theoretical Explanation of Activation Sparsity Through Flat Minima and Adversarial Robustness

The Geometric Occam's Razor Implicit in Deep Learning

Why neural networks find simple solutions: the many regularizers of geometric complexity

Differentiable Sparsification for Deep Neural Networks

Minimum Variance Unbiased N:M Sparsity for the Neural Gradients

Always-Sparse Training by Growing Connections with Guided Stochastic Exploration

Structured flexibility in recurrent neural networks via neuromodulation