Abstract:This paper provides a comprehensive survey of the latest research on multilingual large language models (MLLMs). MLLMs not only are able to understand and generate language across linguistic boundaries, but also represent an important advancement in artificial intelligence. We first discuss the architecture and pre-training objectives of MLLMs, highlighting the key components and methodologies that contribute to their multilingual capabilities. We then discuss the construction of multilingual pre-training and alignment datasets, underscoring the importance of data quality and diversity in enhancing MLLM performance. An important focus of this survey is on the evaluation of MLLMs. We present a detailed taxonomy and roadmap covering the assessment of MLLMs' cross-lingual knowledge, reasoning, alignment with human values, safety, interpretability and specialized applications. Specifically, we extensively discuss multilingual evaluation benchmarks and datasets, and explore the use of LLMs themselves as multilingual evaluators. To enhance MLLMs from black to white boxes, we also address the interpretability of multilingual capabilities, cross-lingual transfer and language bias within these models. Finally, we provide a comprehensive review of real-world applications of MLLMs across diverse domains, including biology, medicine, computer science, mathematics and law. We showcase how these models have driven innovation and improvements in these specialized fields while also highlighting the challenges and opportunities in deploying MLLMs within diverse language communities and application scenarios. We listed the paper related in this survey and publicly available at <a class="link-external link-https" href="https://github.com/tjunlp-lab/Awesome-Multilingual-LLMs-Papers" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenges faced by multilingual large - language models (MLLMs) when dealing with language diversity. Specifically, the paper aims to provide a comprehensive review of the latest research on MLLMs, explore how these models understand and generate information across language boundaries, and their important progress in the field of artificial intelligence. The paper focuses on the following aspects: 1. **Architecture and Pre - training Objectives**: Discuss the architecture design and pre - training objectives of MLLMs, emphasizing key components and methods that contribute to the multilingual ability of the models. 2. **Construction of Multilingual Corpora**: Explore the construction of multilingual pre - training data sets and aligned data sets, emphasizing the importance of data quality and diversity in improving MLLM performance. 3. **Evaluation Methods**: Introduce in detail the evaluation methods of MLLMs, including cross - language knowledge, reasoning ability, alignment with human values, safety, interpretability, and evaluation for specific applications. Specifically discuss multilingual evaluation benchmarks and data sets, and explore the possibility of using LLMs themselves as multilingual evaluation tools. 4. **Interpretability**: Explore how to turn MLLMs from "black boxes" into "white boxes", discussing issues such as the interpretability of multilingual ability, cross - language transfer, and language bias. 5. **Practical Applications**: Comprehensively review the practical applications of MLLMs in different fields, including biology, medicine, computer science, mathematics, and law, etc., showing how these models promote innovation and improvement, while pointing out the challenges and opportunities faced in deploying MLLMs in diverse language communities and application scenarios. In summary, this paper aims to provide a comprehensive understanding of the research status, challenges, and future development directions of MLLMs through a systematic review, in order to promote more inclusive and responsible language technology development.

Multilingual Large Language Models: A Systematic Survey

A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers

Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers

A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias

Responsible Multilingual Large Language Models: A Survey of Development, Applications, and Societal Impact

Efficient Multimodal Large Language Models: A Survey

A Survey on Benchmarks of Multimodal Large Language Models

Survey of different Large Language Model Architectures: Trends, Benchmarks, and Challenges

A Survey on Evaluation of Large Language ModelsJust Accepted

A Survey of Multimodal Large Language Model from A Data-centric Perspective

A Survey on Evaluation of Large Language Models

A Survey for Large Language Models in Biomedicine

A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery

Evaluating Large Language Models: A Comprehensive Survey

A Survey of Large Language Models

Surveying the MLLM Landscape: A Meta-Review of Current Surveys

A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine

History, Development, and Principles of Large Language Models-An Introductory Survey

A Comprehensive Survey and Guide to Multimodal Large Language Models in Vision-Language Tasks

Personalized Multimodal Large Language Models: A Survey