Pivot probability induction for statistical machine translation with topic similarity

Yanzhou Huang,Xiaodong Shi,Jinsong Su,Yidong Chen,Guimin Huang
DOI: https://doi.org/10.12733/jcis8450
2013-01-01
Journal of Computational Information Systems
Abstract:Previous works employ the pivot language approach to conduct statistical machine translation when encountering with limited amount of bilingual corpus. Conventional solutions based upon phrase-table combination overlook the semantic discrepancy between the source-pivot corpus and pivot-target corpus and consequently lead to probability estimation inaccuracy for the induced translation rules. In this paper, the latent topic structure of the document-level training data is learned automatically and each phrase translation rule is assigned to a topic distribution. Furthermore, the phrase probability induction is carried out on the basis of the topic similarity, allowing the translation system to consider the semantic relatedness among different rules. Using BLEU as a metric of translation accuracy, we find out our system is capable of achieving a absolute improvement in in-domain test compared with the baseline system. © 2013 Binary Information Press.
What problem does this paper attempt to address?