Abstract:Transportation big data (TBD) are increasingly combined with artificial intelligence to mine novel patterns and information due to the powerful representational capabilities of deep neural networks (DNNs), especially for anti-COVID19 applications. The distributed cloud-edge-vehicle training architecture has been applied to accelerate DNNs training while ensuring low latency and high privacy for TBD processing. However, multiple intelligent devices (e.g., intelligent vehicles, edge computing chips at base stations) and different networks in intelligent transportation systems lead to computing power and communication heterogeneity among distributed nodes. Existing parallel training mechanisms perform poorly on heterogeneous cloud-edge-vehicle clusters. The synchronous parallel mechanism may force fast workers to wait for the slowest worker for synchronization, thus wasting their computing power. The asynchronous mechanism has communication bottlenecks and can exacerbate the straggler problem, causing increased training iterations and even incorrect convergence. In this paper, we introduce a distributed training framework, Heter-Train. First, a communication-efficient semi-asynchronous parallel mechanism (SAP-SGD) is proposed, which can take full advantage of acceleration effect of asynchronous strategy on heterogeneous training and constrain the straggler problem by using global interval synchronization. Second, Considering the difference in node bandwidth, we design a solution for heterogeneous communication. Moreover, a novel weighted aggregation strategy is proposed to aggregate the model parameters with different versions. Finally, experimental results show that our proposed strategy can achieve up to $6.74 \times$ speedups on training time, with almost no accuracy decrease.

Accelerating Training For Distributed Deep Neural Networks In Mapreduce

Effective Scheduler for Distributed DNN Training Based on MapReduce and GPU Cluster

Adaptive Partitioning and Efficient Scheduling for Distributed DNN Training in Heterogeneous IoT Environment

Joint Dynamic Data and Model Parallelism for Distributed Training of DNNs over Heterogeneous Infrastructure

PipePar: A Pipelined Hybrid Parallel Approach for Accelerating Distributed DNN Training

Optimizing Task Placement and Online Scheduling for Distributed GNN Training Acceleration

DISTRIBUTED HIGH-PERFORMANCE COMPUTING METHODS FOR ACCELERATING DEEP LEARNING TRAINING

Training Acceleration for Deep Neural Networks: A Hybrid Parallelization Strategy

Optimizing Network Performance for Distributed DNN Training on GPU Clusters: ImageNet/AlexNet Training in 1.5 Minutes

Training Deep Neural Network on Multiple GPUs with a Model Averaging Method

Optimizing execution for pipelined‐based distributed deep learning in a heterogeneously networked GPU cluster

Efficient and Robust Parallel DNN Training through Model Parallelism on Multi-GPU Platform

HPH: Hybrid Parallelism on Heterogeneous Clusters for Accelerating Large-scale DNNs Training.

Distributed Training Optimization for DCU

A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters

Heter-Train: A Distributed Training Framework Based on Semi-Asynchronous Parallel Mechanism for Heterogeneous Intelligent Transportation Systems

MP-DPS: Adaptive Distributed Training for Deep Learning Based on Node Merging and Path Prediction

DNN Training Acceleration Via Exploring GPGPU Friendly Sparsity

A Practical Implementation of GPU based Accelerator for Deep Neural Networks

Optimal distributed parallel algorithms for deep learning framework Tensorflow

Parallel Implementation of Multilayered Neural Networks Based on Map-Reduce on Cloud Computing Clusters.