Abstract:New machine translations (MT) technologies are emerging rapidly and with them, bold claims of achieving human parity such as: (i) the results produced approach "accuracy achieved by average bilingual human translators" (Wu et al., 2017b) or (ii) the "translation quality is at human parity when compared to professional human translators" (Hassan et al., 2018) have seen the light of day (Laubli et al., 2018). Aside from the fact that many of these papers craft their own definition of human parity, these sensational claims are often not supported by a complete analysis of all aspects involved in translation. Establishing the discrepancies between the strengths of statistical approaches to MT and the way humans translate has been the starting point of our research. By looking at MT output and linguistic theory, we were able to identify some remaining issues. The problems range from simple number and gender agreement errors to more complex phenomena such as the correct translation of aspectual values and tenses. Our experiments confirm, along with other studies (Bentivogli et al., 2016), that neural MT has surpassed statistical MT in many aspects. However, some problems remain and others have emerged. We cover a series of problems related to the integration of specific linguistic features into statistical and neural MT, aiming to analyse and provide a solution to some of them. Our work focuses on addressing three main research questions that revolve around the complex relationship between linguistics and MT in general. We identify linguistic information that is lacking in order for automatic translation systems to produce more accurate translations and integrate additional features into the existing pipelines. We identify overgeneralization or 'algorithmic bias' as a potential drawback of neural MT and link it to many of the remaining linguistic issues.

Lost in Translation: Analysis of Information Loss During Machine Translation Between Polysynthetic and Fusional Languages

Lost in Machine Translation: A Method to Reduce Meaning Loss

Lost in Translation: Loss and Decay of Linguistic Richness in Machine Translation

Quantifying Synthesis and Fusion and their Impact on Machine Translation

Lost in Translationese? Reducing Translation Effect Using Abstract Meaning Representation

Translation Errors Significantly Impact Low-Resource Languages in Cross-Lingual Learning

Lost in Interpretation: Predicting Untranslated Terminology in Simultaneous Interpretation

Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts

Faux Polyglot: A Study on Information Disparity in Multilingual Large Language Models

No Language Left Behind: Scaling Human-Centered Machine Translation

Spanish and LLM Benchmarks: is MMLU Lost in Translation?

Analyzing the Use of Character-Level Translation with Sparse and Noisy Datasets

Lost in Interpreting: Speech Translation from Source or Interpreter?

On the Influence of Machine Translation on Language Origin Obfuscation

Non-Fluent Synthetic Target-Language Data Improve Neural Machine Translation

InfoLossQA: Characterizing and Recovering Information Loss in Text Simplification

On the Integration of LinguisticFeatures into Statistical and Neural Machine Translation

Central Yup'ik and Machine Translation of Low-Resource Polysynthetic Languages

Lost in Translation: The Algorithmic Gap Between LMs and the Brain

Translation Artifacts in Cross-lingual Transfer Learning

How Machine Translation Helps Foreign Language Students?