Harmonizing Word Alignments and Syntactic Structures for Extracting Phrasal Translation Equivalents

Dun Deng,Nianwen Xue,Shiman Guo
DOI: https://doi.org/10.3115/v1/w15-1001
2015-01-01
Abstract:Accurate identification of phrasal translation equivalents is critical to both phrase-based and syntax-basedmachine translation systems. We show that the extraction of many phrasal translation equivalents is made impossible by word alignments done without taking syntactic structures into consideration. To address the problem, we propose a new annotation scheme where word alignment and the alignment of non-terminal nodes (i.e., phrases) are done simultaneously to avoid conflicts between word alignments and syntactic structures. Relying on this new alignment approach, we construct a Hierarchically Aligned Chinese-English Parallel Treebank (HACEPT), and show that all phrasal translation equivalents can be automatically extracted based on the phrase alignments in HACEPT.
What problem does this paper attempt to address?