Learning Bilingual Distributed Phrase Represenations for Statistical Machine Translation

Chaochao Wang,Deyi Xiong,Min Zhang,Chunyu Kit
2015-01-01
Abstract:Following the idea of using distributed semantic representations to facilitate the computation of semantic similarity between translation equivalents, we propose a novel framework to learn bilingual distributed phrase representations for machine translation. We first induce vector representations for words in the source and target language respectively, in their own semantic space. These word vectors are then used to create phrase representations via composition methods. In order to compute semantic similarity of phrase pairs in the same semantic space, we project phrase representations from the source-side semantic space onto the target-side semantic space via a neural network that is able to conduct nonlinear transformation between the two spaces. We integrate the learned bilingual distributed phrase representations into a hierarchical phrase-based translation system to validate the effectiveness of our proposed framework. Experiment results show that our method is able to significantly improve translation quality and outperform previous methods that only use word representations or linear semantic space transformation.
What problem does this paper attempt to address?