Abstract:Large Language Models (LLMs) present strong general capabilities, and a current compelling challenge is stimulating their specialized capabilities, such as machine translation, through low-cost instruction tuning. The standard instruction-following data is sequentially organized as the concatenation of an instruction, an input, and a response. As the attention mechanism of LLMs has limitations on local focus, LLMs tend to focus more on the words or sentences nearby at each position. This leads to a high risk of instruction forgetting during decoding. To alleviate the above issues, We propose SWIE (Segment-Weighted Instruction Embedding) and an instruction-following dataset OVERMISS. SWIE improves the model instruction understanding by adding a global instruction representation on the following input and response representations. OVERMISS improves model faithfulness by comparing over-translation and miss-translation results with the correct translation. We apply our methods to two main-stream open-source LLMs, BLOOM and LLaMA. The experimental results demonstrate significant improvements in translation performance with SWIE based on BLOOMZ-3b, particularly in zero-shot and long text translations due to reduced instruction forgetting risk. Additionally, OVERMISS outperforms the baseline in translation performance (e.g. an increase in BLEU scores from 0.69 to 3.12 and an average improvement of 0.48 percentage comet scores for LLaMA-7b) with further enhancements seen in models combining OVERMISS and SWIE (e.g. the BLUE scores increase up to 0.56 from English to German across three different backbones), and both exhibit improvements in the faithfulness metric based on word alignment.

Investigating the Translation Performance of a Large Multilingual Language Model: the Case of BLOOM

Understanding BLOOM: An empirical study on diverse NLP tasks

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

How Many Languages Make Good Multilingual Instruction Tuning? A Case Study on BLOOM

Relations between systolic and diastolic function in children with dilated and hypertrophic cardiomyopathy as assessed by tissue Doppler imaging.

Extending the Pre-Training of BLOOM for Improved Support of Traditional Chinese: Models, Methods and Results

How Multilingual Are Large Language Models Fine-Tuned for Translation?

Teaching a Multilingual Large Language Model to Understand Multilingual Speech via Multi-Instructional Training

Crosslingual Generalization through Multitask Finetuning

Improving Translation Faithfulness of Large Language Models via Augmenting Instructions

BigTranslate: Augmenting Large Language Models with Multilingual Translation Capability over 100 Languages

Optimizing Language Augmentation for Multilingual Large Language Models: A Case Study on Korean

The Flores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation

LLMs Beyond English: Scaling the Multilingual Capability of LLMs with Cross-Lingual Feedback

Assessing Translation capabilities of Large Language Models involving English and Indian Languages

What Drives Performance in Multilingual Language Models?

Analyzing the Evaluation of Cross-Lingual Knowledge Transfer in Multilingual Language Models

No Language Left Behind: Scaling Human-Centered Machine Translation

Is Translation All You Need? A Study on Solving Multilingual Tasks with Large Language Models

Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis

Getting More from Less: Large Language Models are Good Spontaneous Multilingual Learners