Communication-Efficient Distributed Deep Learning: A Comprehensive Survey

Zhenheng Tang,Shaohuai Shi,Wei Wang,Bo Li,Xiaowen Chu

2023-09-01

Abstract:Distributed deep learning (DL) has become prevalent in recent years to reduce training time by leveraging multiple computing devices (e.g., GPUs/TPUs) due to larger models and datasets. However, system scalability is limited by communication becoming the performance bottleneck. Addressing this communication issue has become a prominent research topic. In this paper, we provide a comprehensive survey of the communication-efficient distributed training algorithms, focusing on both system-level and algorithmic-level optimizations. We first propose a taxonomy of data-parallel distributed training algorithms that incorporates four primary dimensions: communication synchronization, system architectures, compression techniques, and parallelism of communication and computing tasks. We then investigate state-of-the-art studies that address problems in these four dimensions. We also compare the convergence rates of different algorithms to understand their convergence speed. Additionally, we conduct extensive experiments to empirically compare the convergence performance of various mainstream distributed training algorithms. Based on our system-level communication cost analysis, theoretical and experimental convergence speed comparison, we provide readers with an understanding of which algorithms are more efficient under specific distributed environments. Our research also extrapolates potential directions for further optimizations.

Distributed, Parallel, and Cluster Computing,Machine Learning,Signal Processing

What problem does this paper attempt to address?

The paper aims to address the issue of communication efficiency in Distributed Deep Learning (DL). Specifically, as the scale of models and datasets continues to grow, the training process becomes extremely time-consuming and computationally intensive. To accelerate this process, distributed training has become an effective method, but the accompanying communication costs have become a bottleneck for system scalability. The main objectives of the paper include: 1. **Comprehensive Review**: Provide a comprehensive review of communication-efficient data-parallel distributed deep learning algorithms, covering optimization methods from the system level to the algorithm level. 2. **Classification Framework**: Propose a classification framework that divides data-parallel distributed training algorithms into four main dimensions: communication synchronization, system architecture, compression techniques, and the parallelism of communication and computation tasks. 3. **Algorithm Comparison**: Conduct a theoretical analysis of the convergence speed of different algorithms and compare the convergence performance of various mainstream distributed training algorithms through extensive experiments. 4. **Experimental Validation**: Based on system-level communication cost analysis and theoretical and experimental convergence speed comparisons, help readers understand which algorithms are more efficient in specific distributed environments. 5. **Future Directions**: Explore potential further optimization directions. Through this work, the paper hopes to provide researchers and engineers with a comprehensive understanding to inspire them to develop new efficient distributed training algorithms and frameworks.

Communication-Efficient Distributed Deep Learning: A Comprehensive Survey

Communication-Efficient Distributed Deep Learning: Survey, Evaluation, and Challenges.

Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey

Communication Optimization Algorithms for Distributed Deep Learning Systems: A Survey

A Quantitative Survey of Communication Optimizations in Distributed Deep Learning

Communication Compression Techniques in Distributed Deep Learning: A Survey

Communication optimization strategies for distributed deep neural network training: A survey

Efficient Partitioning and Communication Scheme-Based Distributed Edge Computing to Accelerate Deep Neural Network

Communication Optimization for Distributed Training: Architecture, Advances, and Opportunities

DISTRIBUTED HIGH-PERFORMANCE COMPUTING METHODS FOR ACCELERATING DEEP LEARNING TRAINING

Communication-Efficient Distributed Deep Learning via Federated Dynamic Averaging

A Quick Survey on Large Scale Distributed Deep Learning Systems.

Communication Patterns in Distributed Deep Learning

A Hierarchical Communication Algorithm for Distributed Deep Learning Training.

A Survey on Auto-Parallelism of Large-Scale Deep Learning Training

Guest Editorial Introduction to the Special Section on Communication-Efficient Distributed Machine Learning

SparDL: Distributed Deep Learning Training with Efficient Sparse Communication

A Survey From Distributed Machine Learning to Distributed Deep Learning

Communication-Efficient Decentralized Learning with Sparsification and Adaptive Peer Selection

Distributed Learning Systems with First-order Methods

Understanding Communication Characteristics of Distributed Training