Incorporating Linguistic Structure into Maximum Entropy Language Models

Fang GaoLin,Gao Wen,Wang ZhaoQi
DOI: https://doi.org/10.1007/bf02946662
2003-01-01
Abstract:In statistical language models, how to integrate diverse linguistic knowledge in a general framework for long-distance dependencies is a challenging issue. In this paper, an improved language model incorporating linguistic structure into maximum entropy framework is presented.The proposed model combines trigram with the structure knowledge of base phrase in which trigram is used to capture the local relation between words, while the structure knowledge of base phrase is considered to represent the long-distance relations between syntactical structures. The knowledge of syntax, semantics and vocabulary is integrated into the maximum entropy framework.Experimental results show that the proposed model improves by 24% for language model perplexity and increases about 3% for sign language recognition rate compared with the trigram model.
What problem does this paper attempt to address?