Abstract:Originated from distributed learning, federated learning enables privacy-preserved collaboration on a new abstracted level by sharing the model parameters only. While the current research mainly focuses on optimizing learning algorithms and minimizing communication overhead left by distributed learning, there is still a considerable gap when it comes to the real implementation on mobile devices. In this article, we start with an empirical experiment to demonstrate computation heterogeneity is a more pronounced bottleneck than communication on the current generation of battery-powered mobile devices, and the existing methods are haunted by mobile stragglers. Further, non-identically distributed data across the mobile users makes the selection of participants critical to the accuracy and convergence. To tackle the computational and statistical heterogeneity, we utilize data as a tuning knob and propose two efficient polynomial-time algorithms to schedule different workloads on various mobile devices, when data is identically or non-identically distributed. For identically distributed data, we combine partitioning and linear bottleneck assignment to achieve near-optimal training time without accuracy loss. For non-identically distributed data, we convert it into an average cost minimization problem and propose a greedy algorithm to find a reasonable balance between computation time and accuracy. We also establish an offline profiler to quantify the runtime behavior of different devices, which serves as the input to the scheduling algorithms. We conduct extensive experiments on a mobile testbed with two datasets and up to 20 devices. Compared with the common benchmarks, the proposed algorithms achieve 2-100× speedup epoch-wise, 2–7 percent accuracy gain and boost the convergence rate by more than 100 percent on CIFAR10.

Locality-aware and Fault-tolerant Batching for Machine Learning on Distributed Datasets

Optimize Scheduling of Federated Learning on Battery-powered Mobile Devices

Towards Efficient Scheduling of Federated Mobile Devices under Computational and Statistical Heterogeneity

LBB: load-balanced batching for efficient distributed learning on heterogeneous GPU cluster

Understanding the Training Dynamics in Federated Deep Learning via Aggregation Weight Optimization

Semi-Dynamic Load Balancing: Efficient Distributed Learning in Non-Dedicated Environments

EP4DDL: addressing straggler problem in heterogeneous distributed deep learning

Adaptive Partitioning and Efficient Scheduling for Distributed DNN Training in Heterogeneous IoT Environment

Adaptive Batchsize Selection and Gradient Compression for Wireless Federated Learning

Communication-Efficient Adaptive Batch Size Strategies for Distributed Local Gradient Methods

Robust Fully-Asynchronous Methods for Distributed Training over General Architecture

Online Scheduling Algorithm for Heterogeneous Distributed Machine Learning Jobs

Cloudless-Training: A Framework to Improve Efficiency of Geo-Distributed ML Training

A Scalable, High-Performance, and Fault-Tolerant Network Architecture for Distributed Machine Learning

Distributed Non-Convex Optimization with Sublinear Speedup under Intermittent Client Availability

LOAM: Low-latency Communication, Caching, and Computation Placement in Data-Intensive Computing Networks

Dynamic Mini-batch SGD for Elastic Distributed Training: Learning in the Limbo of Resources

Boosting Distributed Machine Learning Training Through Loss-tolerant Transmission Protocol

A General Distributed Dual Coordinate Optimization Framework for Regularized Loss Minimization

DYNAMITE: Dynamic Interplay of Mini-Batch Size and Aggregation Frequency for Federated Learning with Static and Streaming Dataset

Load Balancing in Federated Learning