Abstract:The scarcity of non-English data limits the development of non-English large language models (LLMs). Transforming English-centric LLMs to non-English has been identified as an effective and resource-efficient method. Previous works start from base LLMs and perform knowledge distillation (KD) with data generated by stronger LLMs, e.g. GPT-4. Compared to base LLMs, chat LLMs are further optimized for advanced abilities, e.g. multi-turn conversation and human preference alignment, and thus more powerful in both helpfulness and safety. However, transforming a chat LLM involves two critical issues: (1) How can we effectively transfer advanced abilities without their supervised data? (2) How can we prevent the original knowledge from catastrophic forgetting during transformation? We target these issues by introducing a simple framework called TransLLM. For the first issue, TransLLM divides the transfer problem into some common sub-tasks with the translation chain-of-thought, which uses the translation as the bridge between English and non-English step-by-step. We further enhance the performance of sub-tasks with publicly available data. For the second issue, we propose a method comprising two synergistic components: low-rank adaptation for training to maintain the original LLM parameters, and recovery KD, which utilizes data generated by the chat LLM itself to recover the original knowledge from the frozen parameters. In the experiments, we transform the LLaMA-2-chat-7B to the Thai language. Our method, using only single-turn data, outperforms strong baselines and ChatGPT on multi-turn benchmark MT-bench. Furthermore, our method, without safety data, rejects more harmful queries of safety benchmark AdvBench than both ChatGPT and GPT-4.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper primarily explores how to convert large language models (LLMs) based on English into non-English language models. Specifically, the paper proposes solutions to the following two key issues: 1. **How to effectively transfer advanced capabilities without supervised data?** - English chat LLMs possess advanced capabilities in multi-turn conversations and human preference alignment, but lack supervised data for these capabilities during the conversion process. 2. **How to prevent catastrophic forgetting of original knowledge during the conversion process?** - When converting English chat LLMs into non-English models, how to avoid forgetting the original English knowledge during training. The paper proposes a simple framework named **TransLLM** to address the above issues. This framework processes non-English queries step by step through Translation Chain of Thought (TCOT) and introduces Low-Rank Adaptation (LoRA) and Recovery Knowledge Distillation (recovery KD) methods to retain the original knowledge. ### Main Contributions - Proposed an effective and simple framework, TransLLM, for converting chat LLMs into non-English models. - Experiments demonstrate that TransLLM successfully transfers advanced capabilities (such as multi-turn conversations and human preference alignment) with limited data and shows superior helpfulness and safety in Thai compared to ChatGPT. - Analysis shows that Recovery Knowledge Distillation combined with LoRA successfully retains the original knowledge, allowing the model to use the original knowledge for English tasks and new knowledge for non-English tasks. - Discussed the limitations of TransLLM and pointed out future research directions. Through these methods, the paper aims to lay a solid foundation for developing safe non-English LLMs.

Why Not Transform Chat Large Language Models to Non-English?

TCMChat: A Generative Large Language Model for Traditional Chinese Medicine

Chat Vector: A Simple Approach to Equip LLMs With New Language Chat Capabilities

Don't Trust ChatGPT when Your Question is not in English: A Study of Multilingual Abilities and Types of LLMs

Supervised Knowledge Makes Large Language Models Better In-context Learners

ChatGPT Alternative Solutions: Large Language Models Survey

Hidden in Plain Sight: Exploring Chat History Tampering in Interactive Language Models

Enhancing Chat Language Models by Scaling High-quality Instructional Conversations

LLaMA Beyond English: An Empirical Study on Language Capability Transfer

Enhancing Pipeline-Based Conversational Agents with Large Language Models

ChatLLM Network: More brains, More intelligence

Dynamic data sampler for cross-language transfer learning in large language models

BigTranslate: Augmenting Large Language Models with Multilingual Translation Capability over 100 Languages

Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback

Opportunities and Challenges for ChatGPT and Large Language Models in Biomedicine and Health

Chat Vector: A Simple Approach to Equip LLMs with Instruction Following and Model Alignment in New Languages

Using Advanced LLMs to Enhance Smaller LLMs: An Interpretable Knowledge Distillation Approach

Enhancing user experience and trust in advanced LLM-based conversational agents

ChatGPT in the Age of Generative AI and Large Language Models: A Concise Survey

Online Training of Large Language Models: Learn while chatting

Multilingual Jailbreak Challenges in Large Language Models