Abstract:An adaptive statistical language model is described, which successfully integrates long distance linguistic information with other knowledge sources. Most existing statistical language models exploit only the immediate history of a text. To extract information from further back in the document's history, we propose and usetrigger pairsas the basic information bearing elements. This allows the model to adapt its expectations to the topic of discourse. Next, statistical evidence from multiple sources must be combined. Traditionally, linear interpolation and its variants have been used, but these are shown here to be seriously deficient. Instead, we apply the principle of Maximum Entropy (ME). Each information source gives rise to a set of constraints, to be imposed on the combined estimate. The intersection of these constraints is the set of probability functions which are consistent with all the information sources. The function with the highest entropy within that set is the ME solution. Given consistent statistical evidence, a unique ME solution is guaranteed to exist, and an iterative algorithm exists which is guaranteed to converge to it. The ME framework is extremely general: any phenomenon that can be described in terms of statistics of the text can be readily incorporated. An adaptive language model based on the ME approach was trained on theWall Street Journalcorpus, and showed a 32–39% perplexity reduction over the baseline. When interfaced to SPHINX-II, Carnegie Mellon's speech recognizer, it reduced its error rate by 10–14%. This thus illustrates the feasibility of incorporating many diverse knowledge sources in a single, unified statistical framework.

Trigger-based language models: a maximum entropy approach

A maximum entropy approach to adaptive statistical language modelling

Incorporating Linguistic Structure into Maximum Entropy Language Models

An Improved Maximum Entropy Language Model

Combined maximum entropy language model using different feature sets

Efficient representation and fast look-up of Maximum Entropy language models.

An Improved Maximum Entropy Language Model and Its Application

A trigger language model-based IR system

Language Model Evaluation Beyond Perplexity

A neural probabilistic language model

Trans-dimensional Random Fields for Language Modeling.

Get Confused Cautiously: Textual Sequence Memorization Erasure with Selective Entropy Maximization

Transparency at the Source: Evaluating and Interpreting Language Models With Access to the True Distribution

Understanding and Mitigating Tokenization Bias in Language Models

Confidence Regulation Neurons in Language Models

Topic-based mixture language modelling

Exploring the Limits of Language Modeling

Maximum Reconstruction Estimation for Generative Latent-Variable Models.

Improving and Scaling Trans-dimensional Random Field Language Models.

Language Models Implement Simple Word2Vec-style Vector Arithmetic

Slaves to the Law of Large Numbers: An Asymptotic Equipartition Property for Perplexity in Generative Language Models