Abstract:Machine translation (MT) was developed as one of the hottest research topics in the natural language processing (NLP) literature. One important issue in MT is that how to evaluate the MT system reasonably and tell us whether the translation system makes an improvement or not. The traditional manual judgment methods are expensive, time-consuming, unrepeatable, and sometimes with low agreement. On the other hand, the popular automatic MT evaluation methods have some weaknesses. Firstly, they tend to perform well on the language pairs with English as the target language, but weak when English is used as source. Secondly, some methods rely on many additional linguistic features to achieve good performance, which makes the metric unable to replicate and apply to other language pairs easily. Thirdly, some popular metrics utilize incomprehensive factors, which result in low performance on some practical tasks. In this thesis, to address the existing problems, we design novel MT evaluation methods and investigate their performances on different languages. Firstly, we design augmented factors to yield highly accurate evaluation. Secondly, we design a tunable evaluation model where weighting of factors can be optimized according to the characteristics of languages. Thirdly, in the enhanced version of our methods, we design concise linguistic feature using part-of-speech (POS) to show that our methods can yield even higher performance when using some external linguistic resources. Finally, we introduce the practical performance of our metrics in the ACL-WMT workshop shared tasks, which show that the proposed methods are robust across different languages. In addition, we also present some novel work on quality estimation of MT without using reference translations including the usage of probability models of Naïve Bayes (NB), support vector machine (SVM) classification algorithms, and CRFs.

Automatic Quality Assessment for Speech Translation Using Joint ASR and MT Features

Translation Quality Assessment: A Brief Survey on Manual and Automatic Methods

Exploring Pathological Speech Quality Assessment with ASR-Powered Wav2Vec2 in Data-Scarce Context

A new deal for translation quality

SpeechQE: Estimating the Quality of Direct Speech Translation

Pushing the Right Buttons: Adversarial Evaluation of Quality Estimation

End-to-End Evaluation for Low-Latency Simultaneous Speech Translation

Exploring the Correlation between Human and Machine Evaluation of Simultaneous Speech Translation

Exploring ASR-Based Wav2Vec2 for Automated Speech Disorder Assessment: Insights and Analysis

LEPOR: An Augmented Machine Translation Evaluation Metric

Self-Supervised Quality Estimation for Machine Translation.

Automatic Evaluation of Output Quality for Machine Translation Systems

The Impact of ASR on Speech-to-Speech Translation Performance.

Leveraging Weakly Supervised Data to Improve End-to-End Speech-to-Text Translation

On Efficient Coupling of ASR and SMT for Speech Translation

Consistent Human Evaluation of Machine Translation across Language Pairs

Translation Methodology in the Spoken Language Translator: An Evaluation

Assessing the Quality of MT Systems for Hindi to English Translation

Unsupervised Quality Estimation for Neural Machine Translation

The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation

DeepSubQE: Quality estimation for subtitle translations