Abstract:Moderate-sized large language models (LLMs) -- those with 7B or 13B parameters -- exhibit promising machine translation (MT) performance. However, even the top-performing 13B LLM-based translation models, like ALMA, does not match the performance of state-of-the-art conventional encoder-decoder translation models or larger-scale LLMs such as GPT-4. In this study, we bridge this performance gap. We first assess the shortcomings of supervised fine-tuning for LLMs in the MT task, emphasizing the quality issues present in the reference data, despite being human-generated. Then, in contrast to SFT which mimics reference translations, we introduce Contrastive Preference Optimization (CPO), a novel approach that trains models to avoid generating adequate but not perfect translations. Applying CPO to ALMA models with only 22K parallel sentences and 12M parameters yields significant improvements. The resulting model, called ALMA-R, can match or exceed the performance of the WMT competition winners and GPT-4 on WMT'21, WMT'22 and WMT'23 test datasets.

What problem does this paper attempt to address?

This paper mainly discusses how to improve the performance of medium-scale large language models (LLMs) in machine translation tasks. Although these models have shown some potential in machine translation, their performance is still inferior compared to state-of-the-art traditional encoder-decoder translation models or larger-scale LLMs like GPT-4. The researchers first analyzed the limitations of Supervised Fine-tuning (SFT) method, pointing out that there are quality issues even if the reference data is artificially generated. To address this issue, the paper proposes the Contrastive Preference Optimization (CPO) method. Unlike SFT, CPO does not require the model to mimic the reference translation, but rather trains the model to avoid generating translations that are good enough but not perfect. By performing CPO training on the ALMA model using a dataset with only 22K parallel sentences and adjusting 0.1% of the parameters, the results significantly improved the model's performance. The CPO-trained model (referred to as ALMA-R) performs on par with or even surpasses GPT-4 and the winners of the WMT'21, WMT'22, and WMT'23 test datasets. CPO aims to overcome two fundamental shortcomings of SFT: first, SFT's goal is to reduce the gap between predicted outputs and the gold standard reference, which limits the model's performance; second, SFT lacks a mechanism to prevent the model from making mistakes in translation. Through CPO, the model can learn to generate higher-quality translations and avoid producing translations that are close to perfect but actually flawed. Experiments show that the CPO training method not only has advantages in efficiency and speed but also is very effective in improving translation quality. By performing CPO training on the ALMA model, its performance can reach or exceed the level of GPT-4 and the champions of the WMT competition.

Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation

A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models

Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis

A Novel Paradigm Boosting Translation Capabilities of Large Language Models

X-ALMA: Plug & Play Modules and Adaptive Rejection for Quality Translation at Scale

A Preference-driven Paradigm for Enhanced Translation with Large Language Models

Ladder: A Model-Agnostic Framework Boosting LLM-based Machine Translation to the Next Level

Getting More from Less: Large Language Models are Good Spontaneous Multilingual Learners

Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis

The Fine-Tuning Paradox: Boosting Translation Quality Without Sacrificing LLM Abilities

Building Accurate Translation-Tailored LLMs with Language Aware Instruction Tuning

Refining Translations with LLMs: A Constraint-Aware Iterative Prompting Approach

POMP: Probability-driven Meta-graph Prompter for LLMs in Low-resource Unsupervised Neural Machine Translation

Adapting Large Language Models for Document-Level Machine Translation

Tuning LLMs with Contrastive Alignment Instructions for Machine Translation in Unseen, Low-resource Languages

TIM: Teaching Large Language Models to Translate with Comparison

Prompting PaLM for Translation: Assessing Strategies and Performance

LLM-augmented Preference Learning from Natural Language

Extrapolating Large Language Models to Non-English by Aligning Languages

ALMol: Aligned Language-Molecule Translation LLMs through Offline Preference Contrastive Optimisation

Enhancing Document-level Translation of Large Language Model via Translation Mixed-instructions