Hanayo: Harnessing Wave-like Pipeline Parallelism for Enhanced Large Model Training Efficiency

Ziming Liu,Shenggan Cheng,Haotian Zhou,Yang You
DOI: https://doi.org/10.1145/3581784.3607073
2023-08-30
Abstract:Large-scale language models have become increasingly challenging and expensive to train. Among various methods addressing this issue, Pipeline Parallelism has been widely employed to accommodate massive model weights within limited GPU memory. This paper introduces Hanayo, a wave-like pipeline parallelism strategy that boasts a concise structure and practical applicability, alongside a high-performance pipeline execution runtime to tackle the challenges of pipeline strategy implementation. Hanayo mitigates the issues of pipeline bubbles and excessive memory consumption prevalent in existing schemes, without resorting to model duplicates as in Chimera. Our evaluation, conducted on four distinct computing clusters and involving both GPT-like and BERT-like architectures with up to 32 GPUs, demonstrates up to a 30.4 \% increase in throughput compared to the state-of-the-art approach.
Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
This paper attempts to address several key challenges in large - language - model training: 1. **Memory Wall**: With the sharp increase in the number of model parameters, the storage capacity of a single accelerator can no longer meet the demand, resulting in model parameters significantly exceeding the storage capacity of a single accelerator. 2. **Scaling Wall**: Training large models requires the use of thousands of accelerators, which leads to complex parallel patterns and a large amount of communication overhead, thus becoming a bottleneck for scaling. 3. **Computational Wall**: Large models and large - scale datasets place extremely high demands on computing power. 4. **Development Wall**: Complex parallel strategies and manual control of the communication process make the training and development of large models extremely difficult. To meet these challenges, the paper introduces **Hanayo**, a unified framework based on the wave - like pipeline parallel strategy. The main contributions of Hanayo include: 1. **Low - bubble - ratio and high - performance**: Through a unique wave - like pipeline scheme, Hanayo achieves a low - bubble - ratio and high - throughput, and the performance is further improved as the number of waves increases. 2. **Unified framework**: Hanayo proposes a unified pipeline - parallel framework and obtains a unified performance model for pipeline parallelism through theoretical analysis. 3. **Decoupled runtime system**: When designing and implementing the runtime system, Hanayo decouples the relationship between the runtime system and specific pipeline - parallel algorithms, supports almost all pipeline - parallel algorithms using action lists, and optimizes performance through features such as asynchronous communication. 4. **Experimental verification**: The paper conducts performance tests on mainstream GPT - style and BERT - style models on four different computing clusters. The experimental results show that Hanayo improves the performance by up to 30.4% compared to Chimera, the current state - of - the - art pipeline - parallel implementation. In conclusion, through the wave - like pipeline - parallel strategy and unified framework design, Hanayo effectively addresses the memory, scaling, computational, and development challenges in large - model training and improves training efficiency.