Abstract:Originated from distributed learning, federated learning enables privacy-preserved collaboration on a new abstracted level by sharing the model parameters only. While the current research mainly focuses on optimizing learning algorithms and minimizing communication overhead left by distributed learning, there is still a considerable gap when it comes to the real implementation on mobile devices. In this article, we start with an empirical experiment to demonstrate computation heterogeneity is a more pronounced bottleneck than communication on the current generation of battery-powered mobile devices, and the existing methods are haunted by mobile stragglers. Further, non-identically distributed data across the mobile users makes the selection of participants critical to the accuracy and convergence. To tackle the computational and statistical heterogeneity, we utilize data as a tuning knob and propose two efficient polynomial-time algorithms to schedule different workloads on various mobile devices, when data is identically or non-identically distributed. For identically distributed data, we combine partitioning and linear bottleneck assignment to achieve near-optimal training time without accuracy loss. For non-identically distributed data, we convert it into an average cost minimization problem and propose a greedy algorithm to find a reasonable balance between computation time and accuracy. We also establish an offline profiler to quantify the runtime behavior of different devices, which serves as the input to the scheduling algorithms. We conduct extensive experiments on a mobile testbed with two datasets and up to 20 devices. Compared with the common benchmarks, the proposed algorithms achieve 2-100× speedup epoch-wise, 2–7 percent accuracy gain and boost the convergence rate by more than 100 percent on CIFAR10.

Adaptive Partitioning and Efficient Scheduling for Distributed DNN Training in Heterogeneous IoT Environment

Towards Efficient Scheduling of Federated Mobile Devices under Computational and Statistical Heterogeneity

Efficient Partitioning and Communication Scheme-Based Distributed Edge Computing to Accelerate Deep Neural Network

Optimizing execution for pipelined‐based distributed deep learning in a heterogeneously networked GPU cluster

A DNN inference acceleration algorithm combining model partition and task allocation in heterogeneous edge computing system

Reinforcement Learning Based Online Scheduling of Multiple Workflows in Edge Environment

DDPQN: An Efficient DNN Offloading Strategy in Local-Edge-Cloud Collaborative Environments

AccEPT: an Acceleration Scheme for Speeding Up Edge Pipeline-parallel Training

HPH: Hybrid Parallelism on Heterogeneous Clusters for Accelerating Large-scale DNNs Training.

Joint DNN Partition and Resource Allocation Optimization for Energy-Constrained Hierarchical Edge-Cloud Systems

HeterPS: Distributed deep learning with reinforcement learning based scheduling in heterogeneous environments

Heter-Train: A Distributed Training Framework Based on Semi-Asynchronous Parallel Mechanism for Heterogeneous Intelligent Transportation Systems

FTPipeHD: A Fault-Tolerant Pipeline-Parallel Distributed Training Approach for Heterogeneous Edge Devices

Joint Task Partitioning and Parallel Scheduling in Device-Assisted Mobile Edge Networks

Optimizing Task Placement and Online Scheduling for Distributed GNN Training Acceleration

Efficient and Robust Parallel DNN Training through Model Parallelism on Multi-GPU Platform

An Optimal Network-Aware Scheduling Technique for Distributed Deep Learning in Distributed HPC Platforms

FTPipeHD: A Fault-Tolerant Pipeline-Parallel Distributed Training Framework for Heterogeneous Edge Devices

Cost-Driven Off-Loading for DNN-Based Applications Over Cloud, Edge, and End Devices

Accelerating Deep Learning Inference via Model Parallelism and Partial Computation Offloading