Abstract:Recurrent neural networks (RNNs) have achieved state-of-the-art performances in many natural language processing tasks, such as language modeling and machine translation. However, when the vocabulary is large, the RNN model will become very big (e.g., possibly beyond the memory capacity of a GPU device) and its training will become very inefficient. In this work, we propose a novel technique to tackle this challenge. The key idea is to use 2-Component (2C) shared embedding for word representations. We allocate every word in the vocabulary into a table, each row of which is associated with a vector, and each column associated with another vector. Depending on its position in the table, a word is jointly represented by two components: a row vector and a column vector. Since the words in the same row share the row vector and the words in the same column share the column vector, we only need $2 \sqrt{|V|}$ vectors to represent a vocabulary of $|V|$ unique words, which are far less than the $|V|$ vectors required by existing approaches. Based on the 2-Component shared embedding, we design a new RNN algorithm and evaluate it using the language modeling task on several benchmark datasets. The results show that our algorithm significantly reduces the model size and speeds up the training process, without sacrifice of accuracy (it achieves similar, if not better, perplexity as compared to state-of-the-art language models). Remarkably, on the One-Billion-Word benchmark Dataset, our algorithm achieves comparable perplexity to previous language models, whilst reducing the model size by a factor of 40-100, and speeding up the training process by a factor of 2. We name our proposed algorithm \emph{LightRNN} to reflect its very small model size and very high training speed.

Global context-dependent recurrent neural network language model with sparse feature learning

GNN-LM: Language Modeling based on Global Contexts via GNN

Long-Short Range Context Neural Networks for Language Modeling

Residual Recurrent Neural Networks for Learning Sequential Representations.

Dialog context language modeling with recurrent neural networks

RECURRENT NEURAL NETWORK BASED LANGUAGE MODELING WITH CONTROLLABLE EXTERNAL MEMORY

NEWLSTM: an Optimized Long Short-Term Memory Language Model for Sequence Prediction.

Discriminative method for recurrent neural network language models

End-to-End Speech Recognition Contextualization with Large Language Models

Recurrent Memory Networks for Language Modeling

Recurrent Neural Network Based Language Model Adaptation for Accent Mandarin Speech.

LightRNN: Memory and Computation-Efficient Recurrent Neural Networks

Efficient Long-range Language Modeling with Self-supervised Causal Retrieval

ELSTM: An improved long short‐term memory network language model for sequence learning

Deep Neural Networks Language Model Based on CNN and LSTM Hybrid Architecture

Graph LSTM with Context-Gated Mechanism for Spoken Language Understanding.

Leveraging Diverse Modeling Contexts with Collaborating Learning for Neural Machine Translation

A Study on Neural Network Language Modeling

Adaptation of Language Models for SMT Using Neural Networks with Topic Information.

Recurrent Neural Network Language Model with Part-of-speech for Mandarin Speech Recognition.

A Latent Variable Recurrent Neural Network for Discourse Relation Language Models