Abstract:Machine translation (MT) was developed as one of the hottest research topics in the natural language processing (NLP) literature. One important issue in MT is that how to evaluate the MT system reasonably and tell us whether the translation system makes an improvement or not. The traditional manual judgment methods are expensive, time-consuming, unrepeatable, and sometimes with low agreement. On the other hand, the popular automatic MT evaluation methods have some weaknesses. Firstly, they tend to perform well on the language pairs with English as the target language, but weak when English is used as source. Secondly, some methods rely on many additional linguistic features to achieve good performance, which makes the metric unable to replicate and apply to other language pairs easily. Thirdly, some popular metrics utilize incomprehensive factors, which result in low performance on some practical tasks. In this thesis, to address the existing problems, we design novel MT evaluation methods and investigate their performances on different languages. Firstly, we design augmented factors to yield highly accurate evaluation. Secondly, we design a tunable evaluation model where weighting of factors can be optimized according to the characteristics of languages. Thirdly, in the enhanced version of our methods, we design concise linguistic feature using part-of-speech (POS) to show that our methods can yield even higher performance when using some external linguistic resources. Finally, we introduce the practical performance of our metrics in the ACL-WMT workshop shared tasks, which show that the proposed methods are robust across different languages. In addition, we also present some novel work on quality estimation of MT without using reference translations including the usage of probability models of Naïve Bayes (NB), support vector machine (SVM) classification algorithms, and CRFs.

Enhanced Bilingual Evaluation Understudy

Bleu: a Method for Automatic Evaluation of Machine Translation

Polish - English Speech Statistical Machine Translation Systems for the IWSLT 2013

Polish - English Speech Statistical Machine Translation Systems for the IWSLT 2014

Polish to English Statistical Machine Translation

BLEU Meets COMET: Combining Lexical and Neural Metrics Towards Robust Machine Translation Evaluation

Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics

Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation

Comparison and Adaptation of Automatic Evaluation Metrics for Quality Assessment of Re-Speaking

Human Evaluation of English--Irish Transformer-Based NMT

A Study of Pre-editing Methods at the Lexical Level in the Process of Machine Translation

Controlling Extra-Textual Attributes about Dialogue Participants -- A Case Study of English-to-Polish Neural Machine Translation

LEPOR: An Augmented Machine Translation Evaluation Metric

Polish -English Statistical Machine Translation of Medical Texts

Translation Methodology in the Spoken Language Translator: An Evaluation

Exploring the Correlation between Human and Machine Evaluation of Simultaneous Speech Translation

Multi-domain machine translation enhancements by parallel data extraction from comparable corpora

Machine Translation: Phrase-Based, Rule-Based and Neural Approaches with Linguistic Evaluation

Multi-Dimensional Machine Translation Evaluation: Model Evaluation and Resource for Korean

Difficulty-Aware Machine Translation Evaluation

The role of automated evaluation techniques in online professional translator training