Abstract:Existing Neural Machine Translation (NMT) systems are generally trained on a large amount of sentence-level parallel data, and during prediction sentences are independently translated, ignoring cross-sentence contextual information. This leads to inconsistency between translated sentences. In order to address this issue, context-aware models have been proposed. However, document-level parallel data constitutes only a small part of the parallel data available, and many approaches build context-aware models based on a pre-trained frozen sentence-level translation model in a two-step training manner. The computational cost of these approaches is usually high. In this paper, we propose to make the most of layers pre-trained on sentence-level data in contextual representation learning, reusing representations from the sentence-level Transformer and significantly reducing the cost of incorporating contexts in translation. We find that representations from shallow layers of a pre-trained sentence-level encoder play a vital role in source context encoding, and propose to perform source context encoding upon weighted combinations of pre-trained encoder layers' outputs. Instead of separately performing source context and input encoding, we propose to iteratively and jointly encode the source input and its contexts and to generate input-aware context representations with a cross-attention layer and a gating mechanism, which resets irrelevant information in context encoding. Our context-aware Transformer model outperforms the recent CADec [Voita et al., 2019c] on the English-Russian subtitle data and is about twice as fast in training and decoding.

Encoder and Decoder, Not One Less for Pre-trained Language Model Sponsored NMT

Integrating Pre-trained Language Model into Neural Machine Translation

Deep Fusing Pre-trained Models into Neural Machine Translation.

Explicitly Modeling Word Translations in Neural Machine Translation

SE‐Former: Incorporating Sentence Embeddings into Transformer for Low‐resource NMT

GreenPLM: Cross-Lingual Transfer of Monolingual Pre-Trained Language Models at Almost No Cost

Character-Aware Low-Resource Neural Machine Translation With Weight Sharing And Pre-Training

DeMPT: Decoding-enhanced Multi-phase Prompt Tuning for Making LLMs Be Better Context-aware Translators

Language Models are Good Translators

Multiscale Collaborative Deep Models for Neural Machine Translation

Multi-channel Encoder for Neural Machine Translation

Improving Neural Machine Translation with Pre-trained Representation

A Novel Optimization Scheme for Named Entity Recognition with Pre-trained Language Models

Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation.

Efficient Context-Aware Neural Machine Translation with Layer-Wise Weighting and Input-Aware Gating.

DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders

Revisiting Simple Neural Probabilistic Language Models

Language Model-Driven Unsupervised Neural Machine Translation

Does Multi-Encoder Help? A Case Study on Context-Aware Neural Machine Translation

Simple Fusion: Return of the Language Model

Language-aware Interlingua for Multilingual Neural Machine Translation.