A Context-Aware Topic Model for Statistical Machine Translation.

Jinsong Su,Deyi Xiong,Yang Liu,Xianpei Han,Hongyu Lin,Junfeng Yao,Min Zhang
DOI: https://doi.org/10.3115/v1/p15-1023
2015-01-01
Abstract:Lexical selection is crucial for statistical machine translation. Previous studies separately exploit sentence-level contexts and document-level topics for lexical selection, neglecting their correlations. In this paper, we propose a context-aware topic model for lexical selection, which not only models local contexts and global topics but also captures their correlations. The model uses target-side translations as hidden variables to connect document topics and source-side local contextual words. In order to learn hidden variables and distributions from data, we introduce a Gibbs sampling algorithm for statistical estimation and inference. A new translation probability based on distributions learned by the model is integrated into a translation system for lexical selection. Experiment results on NIST Chinese-English test sets demonstrate that 1) our model significantly outperforms previous lexical selection methods and 2) modeling correlations between local words and global topics can further improve translation quality.
What problem does this paper attempt to address?