Abstract:We study distributed multiagent optimization over (directed, time-varying) graphs. We consider the minimization of $F+G$ subject to convex constraints, where $F$ is the smooth strongly convex sum of the agent's losses and $G$ is a nonsmooth convex function. We build on the SONATA algorithm: the algorithm employs the use of surrogate objective functions in the agents' subproblems (going thus beyond linearization, such as proximal-gradient) coupled with a perturbed (push-sum) consensus mechanism that aims to track locally the gradient of $F$. SONATA achieves precision $\epsilon>0$ on the objective value in $\mathcal{O}(\kappa_g \log(1/\epsilon))$ gradient computations at each node and $\tilde{\mathcal{O}}\big(\kappa_g (1-\rho)^{-1/2} \log(1/\epsilon)\big)$ communication steps, where $\kappa_g$ is the condition number of $F$ and $\rho$ characterizes the connectivity of the network. This is the first linear rate result for distributed composite optimization; it also improves on existing (non-accelerated) schemes just minimizing $F$, whose rate depends on much larger quantities than $\kappa_g$ (e.g., the worst-case condition number among the agents). When considering in particular empirical risk minimization problems with statistically similar data across the agents, SONATA employing high-order surrogates achieves precision $\epsilon>0$ in $\mathcal{O}\big((\beta/\mu) \log(1/\epsilon)\big)$ iterations and $\tilde{\mathcal{O}}\big((\beta/\mu) (1-\rho)^{-1/2} \log(1/\epsilon)\big)$ communication steps, where $\beta$ measures the degree of similarity of the agents' losses and $\mu$ is the strong convexity constant of $F$. Therefore, when $\beta/\mu < \kappa_g$, the use of high-order surrogates yields provably faster rates than what achievable by first-order models; this is without exchanging any Hessian matrix over the network.

Random Gradient Extrapolation for Distributed and Stochastic Optimization

An optimal randomized incremental gradient method

Augmented Distributed Gradient Methods for Multi-Agent Optimization under Uncoordinated Constant Stepsizes

Accelerated Primal-Dual Algorithms for Distributed Smooth Convex Optimization over Networks

A Push-Pull Gradient Method for Distributed Optimization in Networks.

Distributed Stochastic Algorithm for Global Optimization in Networked System

A Communication-Efficient Stochastic Gradient Descent Algorithm for Distributed Nonconvex Optimization

Asynchronous Decentralized Accelerated Stochastic Gradient Descent

Convergence in High Probability of Distributed Stochastic Gradient Descent Algorithms

Distributed Optimization Based on Gradient-tracking Revisited: Enhancing Convergence Rate via Surrogation

An Optimal Stochastic Algorithm for Decentralized Nonconvex Finite-sum Optimization

Accelerated Distributed Aggregative Optimization

Distributed Random Reshuffling Methods with Improved Convergence

Distributed Stochastic Subgradient Projection Algorithms for Convex Optimization

Achieving Acceleration in Distributed Optimization via Direct Discretization of the Heavy-Ball ODE

Distributed Stochastic Consensus Optimization With Momentum for Nonconvex Nonsmooth Problems

Accelerated Stochastic Algorithms for Nonconvex Finite-sum and Multi-block Optimization

Local AdaGrad-Type Algorithm for Stochastic Convex-Concave Optimization

Decentralized projected Riemannian stochastic recursive momentum method for smooth optimization on compact submanifolds

Non-Smooth Setting of Stochastic Decentralized Convex Optimization Problem Over Time-Varying Graphs

Simple and Optimal Stochastic Gradient Methods for Nonsmooth Nonconvex Optimization