Abstract:To accelerate the training speed of massive DNN models on large-scale datasets, distributed training techniques, including data parallelism and model parallelism, have been extensively studied. In particular, pipeline parallelism, which is derived from model parallelism, has been attracting attention. It splits the model parameters across multiple computing nodes and executes multiple mini-batches simultaneously. However, naive pipeline parallelism suffers from the issues of weight inconsistency and delayed gradients, as the model parameters used in the forward and backward passes do not match, causing unstable training and low performance. In this study, we propose a novel pipeline parallelism technique called EA-Pipe to address the weight inconsistency and delayed gradient problems. EA-Pipe applies an elastic averaging method, which has been studied in the context of data parallelism, to pipeline parallelism. The proposed method maintains multiple model replicas to solve the weight inconsistency problem, and synchronizes the model replicas using an elasticity-based moving average method to mitigate the delayed gradient problem. To verify the efficacy of the proposed method, we conducted three image classification experiments on the CIFAR-10/100 and ImageNet datasets. The experimental results show that EA-Pipe not only accelerates training speed but also demonstrates more stable learning property compared to existing pipeline parallelism techniques. Especially, in the experiments using the CIFAR-100 and ImageNet datasets, EA-Pipe recorded error rates that were 2.58% and 2.19% lower, respectively, than the baseline pipeline parallelization method.

Elastic Averaging for Efficient Pipelined DNN Training.

Pipeline Parallelism With Elastic Averaging

ElasticPipe

BaPipe: Exploration of Balanced Pipeline Parallelism for DNN Training

vPipe: A Virtualized Acceleration System for Achieving Efficient and Scalable Pipeline Parallel DNN Training

XPipe: Efficient Pipeline Model Parallelism for Multi-GPU DNN Training

Optimizing execution for pipelined‐based distributed deep learning in a heterogeneously networked GPU cluster

FreezePipe: An Efficient Dynamic Pipeline Parallel Approach Based on Freezing Mechanism for Distributed DNN Training.

DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines

BitPipe: Bidirectional Interleaved Pipeline Parallelism for Accelerating Large Models Training

DAPPLE: A Pipelined Data Parallel Approach for Training Large Models

PipeMare: Asynchronous Pipeline Parallel DNN Training

SAPipe: Staleness-Aware Pipeline for Data Parallel DNN Training

PipeDream: Fast and Efficient Pipeline Parallel DNN Training

HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU Clusters through Integration of Pipelined Model Parallelism and Data Parallelism

AccEPT: an Acceleration Scheme for Speeding Up Edge Pipeline-parallel Training

GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism

PipePar: A Pipelined Hybrid Parallel Approach for Accelerating Distributed DNN Training

Enabling Data Movement and Computation Pipelining in Deep Learning Compiler

Pipeline Parallelism for Inference on Heterogeneous Edge Computing