Abstract:The language model is one of the most important domains in natural language processing. It is a bridge for the computer to identify and comprehend human language, and it is also a sign of Artificial Intelligence development. The language model is popular in Speech Recognition, Machine Translation, Information Retrieval, and Knowledge Mapping. With the rapid expansion of technology and hardware, the language model has experienced a transformation from statistical model to neural network model and then to the deep neural network model. The wide application of depth learning makes language modeling more extensive, complex, and expensive. This paper combines the person-alized input, convolutional neural network (CNN) coding, and the technique of union gate, cooperating with long short-term memory (LSTM) mechanism to improve the language model. The dynamic integration of LSTM and CNN is called Gated CLSTM. In the experiment, we used the deep learning framework Tensorflow to achieve a Gated GLSTM architecture. Besides, some classical optimization techniques, such as noise contrastive estimation and re-current projection layer, were adopted in the experiment. We tested the performance of the Gated CLSTM under an open and big scale corpus set and trained a signal-layer model and a three-layer model to observe how network depth influences the performance. The single-layer model has 4 days of training experience and reduced the perplexity to 42.1 in four GPU console environment. The three-layer model reduced the perplexity to 33.1 in 6 days. Compared with some classical benchmark models, significant improvements have been made by Gated CLSTM considering both hardware and time complexity and perplexity.

Building Neural Network Language Model with POS-based Negative Sampling and Stochastic Conjugate Gradient Descent

Visualizing and Understanding Neural Models in NLP

Parallel Randomized Block Coordinate Descent for Neural Probabilistic Language Model with High-Dimensional Output Targets.

A Mongolian Language Model Based on Recurrent Neural Networks

Improving Negative Sampling for Word Representation Using Self-embedded Features

Improving Word Representations: A Sub-sampled Unigram Distribution for Negative Sampling

Attention Based Joint Model with Negative Sampling for New Slot Values Recognition.

Understanding Negative Sampling in Graph Representation Learning

Language Models as Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Models

A neural probabilistic language model

Generating Enhanced Negatives for Training Language-Based Object Detectors

On Sampling-Based Training Criteria for Neural Language Modeling

Neural Network Language Modeling With Letter-Based Features And Importance Sampling

Exploring the Impact of Negative Samples of Contrastive Learning: A Case Study of Sentence Embedding

Deep Neural Networks Language Model Based on CNN and LSTM Hybrid Architecture

Diversifying Neural Text Generation with Part-of-Speech Guided Softmax and Sampling

Neighbor Does Matter: Curriculum Global Positive-Negative Sampling for Vision-Language Pre-training

A Study on Neural Network Language Modeling

Adversarial Training Regularization for Negative Sampling Based Network Embedding

Fast Parallel Training of Neural Language Models.

Detecting and Exorcising Statistical Demons from Language Models with Anti-Models of Negative Data