Context-Aware Phrase Representation For Statistical Machine Translation

Zhiwei Ruan,Jinsong Su,Deyi Xiong,Rongrong Ji
DOI: https://doi.org/10.1007/978-3-319-97304-3_11
2018-01-01
Abstract:Phrases are the basic translation units in the conventional (phrase-based) statistical machine translation (SMT), and learning compact vector representations for the basic phrasal translation units is the essential and fundamental work. However, most existing works focus on exploring internal relationship among words within phrases, which are lack of context information and not sufficient for phrase representation learning. To solve this problem, we propose a context-aware phrase representation learning framework, which extends the bilingually-constrained recursive autoencoder with context modeling component. By this way, we obtain context-aware phrase representation. Furthermore, for the word and topic form the base of the our model, we regard the word and topic as two different vertexes and construct a bipartite network. Thus, we naturally introduce the bipartite network embedding method to learn better word and topic embedding, which further improve the quality of the phrase representation. To evaluate the effectiveness of our method, we conduct experiments on Chinese-English translation. Experimental results show that the proposed method significantly improves the translation quality on NIST test sets.
What problem does this paper attempt to address?