Abstract:This paper provides a comprehensive survey of the latest research on multilingual large language models (MLLMs). MLLMs not only are able to understand and generate language across linguistic boundaries, but also represent an important advancement in artificial intelligence. We first discuss the architecture and pre-training objectives of MLLMs, highlighting the key components and methodologies that contribute to their multilingual capabilities. We then discuss the construction of multilingual pre-training and alignment datasets, underscoring the importance of data quality and diversity in enhancing MLLM performance. An important focus of this survey is on the evaluation of MLLMs. We present a detailed taxonomy and roadmap covering the assessment of MLLMs' cross-lingual knowledge, reasoning, alignment with human values, safety, interpretability and specialized applications. Specifically, we extensively discuss multilingual evaluation benchmarks and datasets, and explore the use of LLMs themselves as multilingual evaluators. To enhance MLLMs from black to white boxes, we also address the interpretability of multilingual capabilities, cross-lingual transfer and language bias within these models. Finally, we provide a comprehensive review of real-world applications of MLLMs across diverse domains, including biology, medicine, computer science, mathematics and law. We showcase how these models have driven innovation and improvements in these specialized fields while also highlighting the challenges and opportunities in deploying MLLMs within diverse language communities and application scenarios. We listed the paper related in this survey and publicly available at <a class="link-external link-https" href="https://github.com/tjunlp-lab/Awesome-Multilingual-LLMs-Papers" rel="external noopener nofollow">this https URL</a>.

Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond

ML-SUPERB: Multilingual Speech Universal PERformance Benchmark

ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets

SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning

Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks

SUPERB: Speech Processing Universal PERformance Benchmark

A Large-Scale Evaluation of Speech Foundation Models

SUPERB-SG: Enhanced Speech Processing Universal PERformance Benchmark for Semantic and Generative Capabilities

SUPERB: Speech Understanding and PERformance Benchmark

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark (Published at NeurIPS 2024 Track Datasets and Benchmarks)

Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech

OLR 2021 Challenge: Datasets, Rules and Baselines

A Survey on Benchmarks of Multimodal Large Language Models

Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus

C L ] 2 3 Ju l 2 02 1 OLR 2021 CHALLENGE : DATASETS , RULES AND BASELINES

The SLT 2021 children speech recognition challenge: Open datasets, rules and baselines

On the Evaluation of Speech Foundation Models for Spoken Language Understanding

Multilingual Large Language Models: A Systematic Survey

Findings of the 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge

Roadmap towards Superhuman Speech Understanding using Large Language Models

Evaluating Self-Supervised Speech Representations for Indigenous American Languages