Korean Neural Machine Translation Using Hierarchical Word Structure

Jeonghyeok Park,Hai Zhao
DOI: https://doi.org/10.1109/IALP51396.2020.9310510
2020-01-01
Abstract:Korean neural machine translation may significantly suffer from low-resource issues. We thus propose an enhancement method that fully exploits the hierarchical Korean word embedding structure from source representation. To our best knowledge, this is the first attempt for such Korean NMT tasks. Every Korean word can be decomposed into characterand jamo-level (sub-character) units. Therefore, We merge the character- and jamo-level representations with word embeddings to capture important combining word meaning. And then the merged representations are fed into NMT model. Our simple and novel method achieves BLEU improvements (up to 0.6) compared to word-based NMT baselines on Korean-to-Chinese and Koreanto-English translation tasks.
What problem does this paper attempt to address?