Advances of Pipeline Model Parallelism for Deep Learning Training: An Overview

Lei Guan,Dong-Sheng Li,Ji-Ye Liang,Wen-Jian Wang,Ke-Shi Ge,Xi-Cheng Lu
DOI: https://doi.org/10.1007/s11390-024-3872-3
IF: 1.871
2024-07-23
Journal of Computer Science and Technology
Abstract:Deep learning has become the cornerstone of artificial intelligence, playing an increasingly important role in human production and lifestyle. However, as the complexity of problem-solving increases, deep learning models become increasingly intricate, resulting in a proliferation of large language models with an astonishing number of parameters. Pipeline model parallelism (PMP) has emerged as one of the mainstream approaches to addressing the significant challenge of training "big models". This paper presents a comprehensive review of PMP. It covers the basic concepts and main challenges of PMP. It also comprehensively compares synchronous and asynchronous pipeline schedules for PMP approaches, and discusses the main techniques to achieve load balance for both intra-node and inter-node training. Furthermore, the main techniques to optimize computation, storage, and communication are presented, with potential research directions being discussed.
computer science, software engineering, hardware & architecture
What problem does this paper attempt to address?