Improving Chinese-Vietnamese Neural Machine Translation with Linguistic Differences

Zhiqiang Yu,Zhengtao Yu,Yantuan Xian,Yuxin Huang,Junjun Guo
DOI: https://doi.org/10.1145/3477536
2022-03-31
Abstract:We present a simple, efficient data augmentation approach for boosting Chinese-Vietnamese neural machine translation performance by leveraging the linguistic difference between the two languages. We first define the formalized representation of modifier symmetry, which is one of the most representative linguistic differences between Chinese and Vietnamese. We then propose and test two data augmentation strategies for leveraging the linguistic difference, which can be integrated naturally with different translation models. Results indicate that both strategies can introduce linguistic rules to boost translation accuracy. Tests on Chinese-Vietnamese benchmarks show significant accuracy improvements. To facilitate studies in this domain, we also release an open-source toolkit 1 with flexible implementation for Chinese-Vietnamese linguistic difference tagging.
computer science, artificial intelligence
What problem does this paper attempt to address?