An Experimental Analysis of Unknown Words in Neural Machine Translation Using Sub-word Unit

HAN Dong,LI Junhui,XIONG Deyi,ZHOU Guodong
DOI: https://doi.org/10.3969/j.issn.1003-0077.2018.04.009
2018-01-01
Abstract:Neural machine translation,as state-of-the-art method for machine translation,is substantially challenged by the issue of unknown word translation.Byte Pair Encoding (BPE)is a well recognized solution ,in which a word is discomposed into sub-word units of higher frequency before translation.This paper investigates the effectiveness of BPE method to resolve the unknown word translation in Chinese-English translation.Experimental results show that BPE method achieves 1.02 BLEU improvements.Further analysis reveals that neural machine translation with BPE method achieves 0.45 accuracy in unknown word translation,comparable to that of classical statistical machine translation.
What problem does this paper attempt to address?