Tower: An Open Multilingual Large Language Model for Translation-Related Tasks

Duarte M. Alves,José Pombal,Nuno M. Guerreiro,Pedro H. Martins,João Alves,Amin Farajian,Ben Peters,Ricardo Rei,Patrick Fernandes,Sweta Agrawal,Pierre Colombo,José G.C. de Souza,André F.T. Martins
2024-02-28
Abstract:While general-purpose large language models (LLMs) demonstrate proficiency on multiple tasks within the domain of translation, approaches based on open LLMs are competitive only when specializing on a single task. In this paper, we propose a recipe for tailoring LLMs to multiple tasks present in translation workflows. We perform continued pretraining on a multilingual mixture of monolingual and parallel data, creating TowerBase, followed by finetuning on instructions relevant for translation processes, creating TowerInstruct. Our final model surpasses open alternatives on several tasks relevant to translation workflows and is competitive with general-purpose closed LLMs. To facilitate future research, we release the Tower models, our specialization dataset, an evaluation framework for LLMs focusing on the translation ecosystem, and a collection of model generations, including ours, on our benchmark.
Computation and Language
What problem does this paper attempt to address?
This paper mainly explores how to apply large language models (LLMs) to various translation-related tasks. Currently, although general LLMs perform well on translation tasks, open-source LLMs can only compete with them when they focus on a single task. Researchers propose a method that continuously pretrains on multilingual monolingual and parallel data to create a model called "TOWER BASE," and then fine-tunes it using translation-related instructions to generate the "TOWER INSTRUCT" model. Their final model outperforms open-source alternatives and competes with closed-source models such as GPT-4 and GPT-3.5-turbo on multiple translation-related tasks. Experimental results show that TOWER INSTRUCT performs better than other open-source models in translation quality, automatic post-editing, grammar error correction, and named entity recognition tasks. The researchers also construct a diverse dataset called "TOWER BLOCKS" to facilitate the model's adaptation to translation tasks and conduct extensive evaluations to ensure its performance in a multi-task environment. Additionally, they release the TOWER model, TOWER BLOCKS dataset, evaluation framework, and benchmark test sets to promote future research. In conclusion, this paper addresses the construction of a multilingual LLM that can effectively perform various translation-related tasks, filling the gap in open-source models in this regard.