BayLing: Bridging Cross-lingual Alignment and Instruction Following through Interactive Translation for Large Language Models

Shaolei Zhang,Qingkai Fang,Zhuocheng Zhang,Zhengrui Ma,Yan Zhou,Langlin Huang,Mengyu Bu,Shangtong Gui,Yunji Chen,Xilin Chen,Yang Feng

2023-06-21

Abstract:Large language models (LLMs) have demonstrated remarkable prowess in language understanding and generation. Advancing from foundation LLMs to instructionfollowing LLMs, instruction tuning plays a vital role in aligning LLMs to human preferences. However, the existing LLMs are usually focused on English, leading to inferior performance in non-English languages. In order to improve the performance for non-English languages, it is necessary to collect language-specific training data for foundation LLMs and construct language-specific instructions for instruction tuning, both of which are heavy loads. To minimize human workload, we propose to transfer the capabilities of language generation and instruction following from English to other languages through an interactive translation task. We have developed BayLing, an instruction-following LLM by utilizing LLaMA as the foundation LLM and automatically constructing interactive translation instructions for instructing tuning. Extensive assessments demonstrate that BayLing achieves comparable performance to GPT-3.5-turbo, despite utilizing a considerably smaller parameter size of only 13 billion. Experimental results on translation tasks show that BayLing achieves 95% of single-turn translation capability compared to GPT-4 with automatic evaluation and 96% of interactive translation capability compared to GPT-3.5-turbo with human evaluation. To estimate the performance on general tasks, we created a multi-turn instruction test set called BayLing-80. The experimental results on BayLing-80 indicate that BayLing achieves 89% of performance compared to GPT-3.5-turbo. BayLing also demonstrates outstanding performance on knowledge assessment of Chinese GaoKao and English SAT, second only to GPT-3.5-turbo among a multitude of instruction-following LLMs. Demo, homepage, code and models of BayLing are available.

Computation and Language,Artificial Intelligence

What problem does this paper attempt to address?

The paper aims to address the performance deficiency of large language models (LLMs) in non-English languages. Specifically, existing large language models often focus on English training data, which leads to poor performance when handling non-English tasks. To solve this problem, researchers have proposed BayLing, a large-scale instruction-following language model that enhances multilingual understanding and generation capabilities through interactive translation tasks. BayLing is built on the LLaMA model and fine-tuned by automatically constructing interactive translation instructions. This approach not only improves the generation capabilities in non-English languages but also enhances the model's ability to understand user intent and follow instructions without the need to collect a large amount of specific language training data or construct specific language instruction sets. In this way, BayLing can achieve high-quality semantic alignment across multiple languages and excel in instruction-following tasks. Researchers evaluated BayLing's performance through a series of experiments, including multilingual translation, interactive translation, general tasks, and standardized tests. The experimental results show that despite BayLing's relatively small parameter size (only 13 billion), its performance on these tasks is quite close to GPT-3.5-turbo, especially in single-round and multi-round interactive translation tasks, where BayLing performs excellently. In summary, the main goal of this paper is to develop an effective method to improve the performance of large language models in non-English languages, and this goal is achieved through BayLing.

BayLing: Bridging Cross-lingual Alignment and Instruction Following through Interactive Translation for Large Language Models

BayLing 2: A Multilingual Large Language Model with Efficient Language Alignment

LLaMA Beyond English: An Empirical Study on Language Capability Transfer

Evaluating Large Language Models at Evaluating Instruction Following

Investigating Multilingual Instruction-Tuning: Do Polyglot Models Demand for Multilingual Instructions?

Eliciting the Translation Ability of Large Language Models via Multilingual Finetuning with Translation Instructions

Bailong: Bilingual Transfer Learning based on QLoRA and Zip-tie Embedding

An Empirical Study of Instruction-tuning Large Language Models in Chinese

Multi-IF: Benchmarking LLMs on Multi-Turn and Multilingual Instructions Following

Human-in-the-loop Machine Translation with Large Language Model

BigTranslate: Augmenting Large Language Models with Multilingual Translation Capability over 100 Languages

Building Accurate Translation-Tailored LLMs with Language Aware Instruction Tuning

Getting More from Less: Large Language Models are Good Spontaneous Multilingual Learners

Improving Translation Faithfulness of Large Language Models via Augmenting Instructions

AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability

Empowering Cross-lingual Abilities of Instruction-tuned Large Language Models by Translation-following demonstrations

BioInstruct: Instruction Tuning of Large Language Models for Biomedical Natural Language Processing

Align^2LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation

LIFBench: Evaluating the Instruction Following Performance and Stability of Large Language Models in Long-Context Scenarios

Bridging the Language Gaps in Large Language Models with Inference-Time Cross-Lingual Intervention

Extrapolating Large Language Models to Non-English by Aligning Languages