Don't Trust ChatGPT when Your Question is not in English: A Study of Multilingual Abilities and Types of LLMs

Xiang Zhang,Senyu Li,Bradley Hauer,Ning Shi,Grzegorz Kondrak
DOI: https://doi.org/10.48550/arXiv.2305.16339
2023-10-24
Abstract:Large Language Models (LLMs) have demonstrated exceptional natural language understanding abilities and have excelled in a variety of natural language processing (NLP)tasks in recent years. Despite the fact that most LLMs are trained predominantly in English, multiple studies have demonstrated their comparative performance in many other languages. However, fundamental questions persist regarding how LLMs acquire their multi-lingual abilities and how performance varies across different languages. These inquiries are crucial for the study of LLMs since users and researchers often come from diverse language backgrounds, potentially influencing their utilization and interpretation of LLMs' results. In this work, we propose a systematic way of qualifying the performance disparities of LLMs under multilingual settings. We investigate the phenomenon of across-language generalizations in LLMs, wherein insufficient multi-lingual training data leads to advanced multi-lingual capabilities. To accomplish this, we employ a novel back-translation-based prompting method. The results show that GPT exhibits highly translating-like behaviour in multilingual settings.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the multilingual capabilities of large - language models (LLMs) and how their performance varies across languages. Although most LLMs are mainly trained on English data, their performance in multiple languages has been demonstrated. However, fundamental questions still remain regarding how LLMs acquire their multilingual capabilities and how these capabilities affect performance in different languages. These questions are crucial for researchers and users, as they come from different language backgrounds, which may influence how they use LLMs and interpret their outputs. Therefore, the paper proposes a systematic approach to qualitatively and quantitatively evaluate the multilingual capabilities of LLMs and investigates the phenomenon of cross - language generalization through the new method of prompt back - translation, that is, how limited multilingual training data can lead to advanced multilingual capabilities. Specifically, the paper focuses on the following points: 1. **Classification of Multilingual Capabilities**: The paper proposes to divide language - dependent tasks into three categories - Reasoning, Knowledge Access, and Articulation - to analyze the impact of different languages on task performance. 2. **Translation Invariance and Translation Variability**: The paper introduces the concepts of Translation Equivariant (TE) and Translation Variant (TV) tasks to evaluate the performance consistency of tasks between different languages. 3. **Experimental Methods**: The paper uses Prompt Translation (PT) and Response Back - Translation (RBT) methods to measure the performance of LLMs on different languages and their consistency. Through these methods, the paper aims to reveal the behavioral patterns of LLMs when handling multilingual tasks, especially whether they exhibit composite, coordinated, or subordinate multilingual capabilities.