Class-Based language models for chinese-english parallel corpus

Junfei Guo,Juan Liu,Michael Walsh,Helmut Schmid
DOI: https://doi.org/10.1007/978-3-642-37256-8_22
2013-01-01
Abstract:This paper addresses using novel class-based language models on parallel corpora, focusing specifically on English and Chinese languages. We find that the perplexity of Chinese is generally much higher than English and discuss the possible reasons. We demonstrate the relative effectiveness of using class-based models over the modified Kneser-Ney trigram model for our task. We also introduce a rare events clustering and a polynomial discounting mechanism, which is shown to improve results. Our experimental results on parallel corpora indicate that the improvement due to classes are similar for English and Chinese. This suggests that class-based language models should be used for both languages.
What problem does this paper attempt to address?