Translating Translationese: A Two-Step Approach to Unsupervised Machine Translation

Nima Pourdamghani,Nada Aldarrab,Marjan Ghazvininejad,Kevin Knight,Jonathan May
DOI: https://doi.org/10.48550/arXiv.1906.05683
2019-06-12
Abstract:Given a rough, word-by-word gloss of a source language sentence, target language natives can uncover the latent, fully-fluent rendering of the translation. In this work we explore this intuition by breaking translation into a two step process: generating a rough gloss by means of a dictionary and then `translating' the resulting pseudo-translation, or `Translationese' into a fully fluent translation. We build our Translationese decoder once from a mish-mash of parallel data that has the target language in common and then can build dictionaries on demand using unsupervised techniques, resulting in rapidly generated unsupervised neural MT systems for many source languages. We apply this process to 14 test languages, obtaining better or comparable translation results on high-resource languages than previously published unsupervised MT studies, and obtaining good quality results for low-resource languages that have never been used in an unsupervised MT scenario.
Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to achieve high - quality neural machine translation (NMT) without parallel corpora. Specifically, the paper proposes a two - step method to build an unsupervised neural machine translation system, which can quickly generate translation results for multiple source languages without any source - target language parallel data. The paper pays special attention to the translation problems of low - resource languages. Due to the lack of parallel corpora, traditional methods are difficult to be effectively applied. ### Main contributions of the paper include: 1. **Two - step pipeline**: - **First step**: Use a bilingual dictionary to translate the source text word - by - word into Translationese, that is, a non - fluent but meaning - preserving translation version. - **Second step**: Use a pre - trained model to convert the pseudo - translation into a fluent translation in the target language. 2. **No parallel data required**: - This method only requires a comprehensive source - target language dictionary, which can be automatically generated within a few hours using off - the - shelf tools. - Once the model is trained, it can be quickly applied to new source languages without the need for retraining or parameter adjustment. 3. **Extensive experimental verification**: - The authors tested this method on 14 languages, including high - resource and low - resource languages. - The experimental results show that for high - resource languages, this method can achieve translation quality comparable to or better than previous published unsupervised NMT research; for low - resource languages, this method can also achieve relatively good translation results. ### Method overview: 1. **Constructing the dictionary**: - Use cross - language word embedding techniques to automatically construct a bilingual dictionary. Specific steps include training word embeddings for the source language and the target language respectively, and then mapping the word embedding space of the source language to the word embedding space of the target language through linear mapping, thereby finding translation candidates for each source word. 2. **Source text to pseudo - translation**: - Use a 5 - gram target language model to select the best translation option for each source word, taking into account the context information. The specific scoring formula is: \[ \text{Score} = \alpha \cdot P_{\text{LM}} + \beta \cdot d(s, t) \] where \( P_{\text{LM}} \) is the language model score, \( d(s, t) \) is the cosine distance between the source word and the target word, and \(\alpha\) and \(\beta\) are weight parameters. 3. **Pseudo - translation to target language**: - Use a Transformer model to train a translation model from pseudo - translation to the target language. The training data comes from parallel data of multiple high - resource languages, and the source side of these data has been converted into pseudo - translations. ### Conclusion: The method proposed in the paper shows good translation performance on multiple languages, especially for low - resource languages, which is a field less involved in previous unsupervised NMT research. The advantages of this method lie in its ability to quickly adapt to new languages and its ability to generate high - quality translation results without parallel data.