Not All Languages Are Created Equal in LLMs: Improving Multilingual Capability by Cross-Lingual-Thought Prompting

Haoyang Huang,Tianyi Tang,Dongdong Zhang,Wayne Xin Zhao,Ting Song,Yan Xia,Furu Wei
2023-10-22
Abstract:Large language models (LLMs) demonstrate impressive multilingual capability, but their performance varies substantially across different languages. In this work, we introduce a simple yet effective method, called cross-lingual-thought prompting (XLT), to systematically improve the multilingual capability of LLMs. Specifically, XLT is a generic template prompt that stimulates cross-lingual and logical reasoning skills to enhance task performance across languages. We conduct comprehensive evaluations on 7 typical benchmarks related to reasoning, understanding, and generation tasks, covering both high-resource and low-resource languages. Experimental results show that XLT not only remarkably enhances the performance of various multilingual tasks but also significantly reduces the gap between the average performance and the best performance of each task in different languages. Notably, XLT brings over 10 points of average improvement in arithmetic reasoning and open-domain question-answering tasks.
Computation and Language
What problem does this paper attempt to address?
The problem this paper attempts to address is the significant disparity in the capabilities of large language models (LLMs) when handling different languages. Although LLMs perform excellently in multilingual tasks, their performance varies greatly across different languages, particularly underperforming in low-resource languages. To mitigate this imbalance, the paper proposes a method called Cross-Lingual-Thought Prompting (XLT), which aims to systematically enhance the multilingual capabilities of LLMs. Specifically, XLT uses a universal template prompt to stimulate the model's cross-linguistic and logical reasoning abilities, thereby improving its performance in tasks involving different languages. The paper validates the effectiveness of XLT through comprehensive evaluations on 7 typical benchmarks, which cover reasoning, understanding, and generation tasks, and include both high-resource and low-resource languages. Experimental results show that XLT not only significantly improves the performance of various multilingual tasks but also greatly reduces the performance gap between different languages.