Abstract:Large language models (LLMs) have demonstrated remarkable prowess in language understanding and generation. Advancing from foundation LLMs to instructionfollowing LLMs, instruction tuning plays a vital role in aligning LLMs to human preferences. However, the existing LLMs are usually focused on English, leading to inferior performance in non-English languages. In order to improve the performance for non-English languages, it is necessary to collect language-specific training data for foundation LLMs and construct language-specific instructions for instruction tuning, both of which are heavy loads. To minimize human workload, we propose to transfer the capabilities of language generation and instruction following from English to other languages through an interactive translation task. We have developed BayLing, an instruction-following LLM by utilizing LLaMA as the foundation LLM and automatically constructing interactive translation instructions for instructing tuning. Extensive assessments demonstrate that BayLing achieves comparable performance to GPT-3.5-turbo, despite utilizing a considerably smaller parameter size of only 13 billion. Experimental results on translation tasks show that BayLing achieves 95% of single-turn translation capability compared to GPT-4 with automatic evaluation and 96% of interactive translation capability compared to GPT-3.5-turbo with human evaluation. To estimate the performance on general tasks, we created a multi-turn instruction test set called BayLing-80. The experimental results on BayLing-80 indicate that BayLing achieves 89% of performance compared to GPT-3.5-turbo. BayLing also demonstrates outstanding performance on knowledge assessment of Chinese GaoKao and English SAT, second only to GPT-3.5-turbo among a multitude of instruction-following LLMs. Demo, homepage, code and models of BayLing are available.

Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models

Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models

From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning

LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement

RNR: Teaching Large Language Models to Follow Roles and Rules

CoachLM: Automatic Instruction Revisions Improve the Data Quality in LLM Instruction Tuning

Evaluating Large Language Models at Evaluating Instruction Following

CITING: Large Language Models Create Curriculum for Instruction Tuning

Diverse and Fine-Grained Instruction-Following Ability Exploration with Synthetic Data

TL-Training: A Task-Feature-Based Framework for Training Large Language Models in Tool Use

SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning

Balancing Continuous Pre-Training and Instruction Fine-Tuning: Optimizing Instruction-Following in LLMs

Non-instructional Fine-tuning: Enabling Instruction-Following Capabilities in Pre-trained Language Models without Instruction-Following Data

WizardLM: Empowering Large Pre-Trained Language Models to Follow Complex Instructions

Enhancing Task Performance in Continual Instruction Fine-tuning Through Format Uniformity

BayLing: Bridging Cross-lingual Alignment and Instruction Following through Interactive Translation for Large Language Models

Policy Improvement using Language Feedback Models

Beyond Instruction Following: Evaluating Inferential Rule Following of Large Language Models

Multi-IF: Benchmarking LLMs on Multi-Turn and Multilingual Instructions Following

Fine-grained LLM Agent: Pinpointing and Refining Large Language Models via Fine-Grained Actionable Feedback

EasyInstruct: An Easy-to-use Instruction Processing Framework for Large Language Models