Distributed Training of Large Language Models

Wensheng Gan,Yongheng Wang,Philip S. Yu,Fanlong Zeng
DOI: https://doi.org/10.1109/ICPADS60453.2023.00126
2023-12-17
Abstract:The advent of large language models (LLMs), like ChatGPT ushers in revolutionary opportunities that bring a vast variety of applications (such as healthcare, law, and education) across various disciplines. The research report pointed out that the model showcases excellent performance often closely related to the parameter scale of the model, so how to train an LLM? This is a question that everyone is more concerned about. At present, there are several commonly used distributed training frameworks including Megatron-LM, DeepSpeed, etc. In this paper, we first provide a brief introduction, which refers to the current development status of LLM. Second, we start from the status, introducing the current common parallel strategies of LLM distributed training. Next, we briefly introduce the underlying technologies and frameworks that LLM relies on nowadays, describing the current popular ones and types of large models. Then we introduce the optimization techniques used in the LLMs. Finally, we summarize the problems and challenges encountered in the current LLM training and describe the possible future development direction of LLM.
Law,Computer Science
What problem does this paper attempt to address?