A Description of Tunable Machine Translation Evaluation Systems in WMT13 Metrics Task

Aaron Li-Feng Han,Derek F. Wong,Lidia S. Chao,Yi Lu,Liangye He,Yiming Wang,Jiaji Zhou
2013-01-01
Abstract:This paper is to describe our machine translation evaluation systems used for participation in the WMT13 shared Metrics Task. In the Metrics task, we submitted two automatic MT evaluation systems nLEPOR_baseline and LEPOR_v3.1. nLEPOR_baseline is an n-gram based language independent MT evaluation metric employing the factors of modified sentence length penalty, position difference penalty, n-gram precision and n-gram recall. nLEPOR_baseline measures the similarity of the system output translations and the reference translations only on word sequences. LEPOR_v3.1 is a new version of LEPOR metric using the mathematical harmonic mean to group the factors and employing some linguistic features, such as the part-of-speech information. The evaluation results of WMT13 show LEPOR_v3.1 yields the highest averagescore 0.86 with human judgments at systemlevel using Pearson correlation criterion on English-to-other (FR, DE, ES, CS, RU) language pairs.
What problem does this paper attempt to address?