Abstract:Distributed stochastic non-convex optimization problems have recently received attention due to the growing interest of signal processing, computer vision, and natural language processing communities in applications deployed over distributed learning systems (e.g., federated learning). We study the setting where the data is distributed across the nodes of a time-varying directed network, a topology suitable for modeling dynamic networks experiencing communication delays and straggler effects. The network nodes, which can access only their local objectives and query a stochastic first-order oracle to obtain gradient estimates, collaborate to minimize a global objective function by exchanging messages with their neighbors. We propose an algorithm, novel to this setting, that leverages stochastic gradient descent with momentum and gradient tracking to solve distributed non-convex optimization problems over time-varying networks. To analyze the algorithm, we tackle the challenges that arise when analyzing dynamic network systems which communicate gradient acceleration components. We prove that the algorithm's oracle complexity is $\mathcal{O}(1/\epsilon^{1.5})$, and that under Polyak-$Ł$ojasiewicz condition the algorithm converges linearly to a steady error state. The proposed scheme is tested on several learning tasks: a non-convex logistic regression experiment on the MNIST dataset, an image classification task on the CIFAR-10 dataset, and an NLP classification test on the IMDB dataset. We further present numerical simulations with an objective that satisfies the PL condition. The results demonstrate superior performance of the proposed framework compared to the existing related methods.

A fully decentralized distributed learning algorithm for latency communication networks

A Finite Time Discrete Distributed Learning Algorithm Using Stochastic Configuration Network

Decentralized Federated Learning under Communication Delays.

Over-the-air Learning Rate Optimization for Federated Learning

Accelerated Distributed Stochastic Non-Convex Optimization over Time-Varying Directed Networks

A Graph Neural Network Based Decentralized Learning Scheme

Overlay-based Decentralized Federated Learning in Bandwidth-limited Networks

Optimal Complexity in Non-Convex Decentralized Learning over Time-Varying Networks

Decentralized Learning with Lazy and Approximate Dual Gradients

Straggler-aware Distributed Learning: Communication Computation Latency Trade-off

Decentralized Edge Learning via Unreliable Device-to-Device Communications

SPDL: Blockchain-secured and Privacy-preserving Decentralized Learning

Communication-Efficient Distributed Deep Learning: A Comprehensive Survey

Asynchronous Message-Passing and Zeroth-Order Optimization Based Distributed Learning with a Use-Case in Resource Allocation in Communication Networks

Communication-Constrained Distributed Learning: TSI-Aided Asynchronous Optimization with Stale Gradient

Distributed Learning Meets 6G: A Communication and Computing Perspective

Decentralized Federated Learning: Balancing Communication and Computing Costs

Distributed Stochastic Optimization with Random Communication and Computational Delays: Optimal Policies and Performance Analysis

Distributed Learning for Time-varying Networks: A Scalable Design

Delay-Aware Hierarchical Federated Learning

Communication-Efficient Distributed Learning via Sparse and Adaptive Stochastic Gradient