Abstract:Distributed stochastic non-convex optimization problems have recently received attention due to the growing interest of signal processing, computer vision, and natural language processing communities in applications deployed over distributed learning systems (e.g., federated learning). We study the setting where the data is distributed across the nodes of a time-varying directed network, a topology suitable for modeling dynamic networks experiencing communication delays and straggler effects. The network nodes, which can access only their local objectives and query a stochastic first-order oracle to obtain gradient estimates, collaborate to minimize a global objective function by exchanging messages with their neighbors. We propose an algorithm, novel to this setting, that leverages stochastic gradient descent with momentum and gradient tracking to solve distributed non-convex optimization problems over time-varying networks. To analyze the algorithm, we tackle the challenges that arise when analyzing dynamic network systems which communicate gradient acceleration components. We prove that the algorithm's oracle complexity is $\mathcal{O}(1/\epsilon^{1.5})$, and that under Polyak-$Ł$ojasiewicz condition the algorithm converges linearly to a steady error state. The proposed scheme is tested on several learning tasks: a non-convex logistic regression experiment on the MNIST dataset, an image classification task on the CIFAR-10 dataset, and an NLP classification test on the IMDB dataset. We further present numerical simulations with an objective that satisfies the PL condition. The results demonstrate superior performance of the proposed framework compared to the existing related methods.

Distributed Stochastic Gradient Method for Non-Convex Problems with Applications in Supervised Learning

Distributed Learning with Convex SUM-of -Non-convex Objective

Accelerated Primal-Dual Algorithms for Distributed Smooth Convex Optimization over Networks

Convergence in High Probability of Distributed Stochastic Gradient Descent Algorithms

Distributed Proximal Gradient Algorithm for Nonconvex Optimization Over Time-Varying Networks

Distributed Stochastic Algorithm for Global Optimization in Networked System

A Communication-Efficient Stochastic Gradient Descent Algorithm for Distributed Nonconvex Optimization

Distributed Stochastic Subgradient Projection Algorithms for Convex Optimization

Augmented Distributed Gradient Methods for Multi-Agent Optimization under Uncoordinated Constant Stepsizes

Stochastic Strongly Convex Optimization Via Distributed Epoch Stochastic Gradient Algorithm

Convergence of Asynchronous Distributed Gradient Methods over Stochastic Networks

Accelerated Distributed Stochastic Non-Convex Optimization over Time-Varying Directed Networks

Distributed Adaptive Gradient Algorithm with Gradient Tracking for Stochastic Non-Convex Optimization

A Distributed Stochastic Optimization Algorithm with Gradient-Tracking and Distributed Heavy-Ball Acceleration

Edge-Based Stochastic Gradient Algorithm for Distributed Optimization

A Distributed Conjugate Gradient Online Learning Method over Networks

Distributed Stochastic Gradient Tracking Algorithm with Variance Reduction for Non-Convex Optimization

Primal-dual Stochastic Distributed Algorithm for Constrained Convex Optimization

Decentralized Stochastic Subgradient Methods for Nonsmooth Nonconvex Optimization

Distributed Newton Methods for Deep Neural Networks