Abstract:In the last few years, distributed machine learning has been usually executed over heterogeneous networks such as a local area network within a multi-tenant cluster or a wide area network connecting data centers and edge clusters. In these heterogeneous networks, the link speeds among worker nodes vary significantly, making it challenging for state-of-the-art machine learning approaches to perform efficient training. Both centralized and decentralized training approaches suffer from low-speed links. In this paper, we propose a decentralized approach, namely NetMax, that enables worker nodes to communicate via high-speed links and, thus, significantly speed up the training process. NetMax possesses the following novel features. First, it consists of a novel consensus algorithm that allows worker nodes to train model copies on their local dataset asynchronously and exchange information via peer-to-peer communication to synchronize their local copies, instead of a central master node (i.e., parameter server). Second, each worker node selects one peer randomly with a fine-tuned probability to exchange information per iteration. In particular, peers with high-speed links are selected with high probability. Third, the probabilities of selecting peers are designed to minimize the total convergence time. Moreover, we mathematically prove the convergence of NetMax. We evaluate NetMax on heterogeneous cluster networks and show that it achieves speedups of 3.7×, 3.4×, and 1.9× in comparison with the state-of-the-art decentralized training approaches Prague, Allreduce-SGD, and AD-PSGD, respectively.

Quasi-Global Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data

Momentum Tracking: Momentum Acceleration for Decentralized Deep Learning on Heterogeneous Data

Decentralized Deep Learning using Momentum-Accelerated Consensus

A Unified Momentum-based Paradigm of Decentralized SGD for Non-Convex Models and Heterogeneous Data

DecentLaM - Decentralized Momentum SGD for Large-batch Deep Training.

Rethinking the initialization of Momentum in Federated Learning with Heterogeneous Data

Momentum Benefits Non-iid Federated Learning Simply and Provably

Clustered Federated Learning Based on Momentum Gradient Descent for Heterogeneous Data

Global Momentum Compression for Sparse Communication in Distributed Learning

Gradient Scheduling with Global Momentum for Asynchronous Federated Learning in Edge Environment

Hop: Heterogeneity-Aware Decentralized Training

Training Deep Neural Networks with Adaptive Momentum Inspired by the Quadratic Optimization

Communication-Efficient Learning of Deep Networks from Decentralized Data

Privacy-preserving Decentralized Deep Learning with Multiparty Homomorphic Encryption

Gradient Scheduling with Global Momentum for Non-IID Data Distributed Asynchronous Training

Distributed Momentum for Byzantine-resilient Learning

Decentralized Local Updates with Dual-Slow Estimation and Momentum-Based Variance-Reduction for Non-Convex Optimization

Momentum Gradient Descent Federated Learning with Local Differential Privacy

Communication-efficient Decentralized Machine Learning over Heterogeneous Networks

Collaborative Deep Learning Across Multiple Data Centers

DeMo: Decoupled Momentum Optimization