Description and Findings of OPPO's Machine Translation Systems for CCMT 2020

Tingxun Shi,Qian Zhang,Xiaoxue Wang,Xiaopu Li,Zhengshan Xue,Jie Hao
DOI: https://doi.org/10.1007/978-981-33-6162-1_8
2020-01-01
Abstract:This paper demonstrates our machine translation systems for the CCMT 2020, which is composed of four parts. The last three parts report our results in the contest, each respectively focuses on English-Chinese bi-direction translation, Japanese-Chinese-English multi-lingual translation (patent domain), and Chinese minority languages to Mandarin Chinese translation. In each part, we will demonstrate our work on data pre-processing, model training as well as the application of general techniques, such as back-translation, ensemble and reranking. Besides, during our experiments, we surprisingly found that simply applying different Chinese word segmentation tools on low-resource corpora could bring obvious benefit across different tasks, and we will separate an independent section to discuss this finding. Among the 7 directions we participated in, we ranked the first in 6 tasks (For the corpus filtering task, we ranked first in the 500 million words sub-task) and the second for the rest.
What problem does this paper attempt to address?