Abstract:Many modern large-scale machine learning problems benefit from decentralized and stochastic optimization. Recent works have shown that utilizing both decentralized computing and local stochastic gradient estimates can outperform state-of-the-art centralized algorithms, in applications involving highly non-convex problems, such as training deep neural networks. In this work, we propose a decentralized stochastic algorithm to deal with certain smooth non-convex problems where there are $m$ nodes in the system, and each node has a large number of samples (denoted as $n$). Differently from the majority of the existing decentralized learning algorithms for either stochastic or finite-sum problems, our focus is given to both reducing the total communication rounds among the nodes, while accessing the minimum number of local data samples. In particular, we propose an algorithm named D-GET (decentralized gradient estimation and tracking), which jointly performs decentralized gradient estimation (which estimates the local gradient using a subset of local samples) and gradient tracking (which tracks the global full gradient using local estimates). We show that, to achieve certain $\epsilon$ stationary solution of the deterministic finite sum problem, the proposed algorithm achieves an $\mathcal{O}(mn^{1/2}\epsilon^{-1})$ sample complexity and an $\mathcal{O}(\epsilon^{-1})$ communication complexity. These bounds significantly improve upon the best existing bounds of $\mathcal{O}(mn\epsilon^{-1})$ and $\mathcal{O}(\epsilon^{-1})$, respectively. Similarly, for online problems, the proposed method achieves an $\mathcal{O}(m \epsilon^{-3/2})$ sample complexity and an $\mathcal{O}(\epsilon^{-1})$ communication complexity, while the best existing bounds are $\mathcal{O}(m\epsilon^{-2})$ and $\mathcal{O}(\epsilon^{-2})$, respectively.

DINE: Decentralized Inexact Newton with Exact Linear Convergence Rate

Accelerated Distributed Approximate Newton Method

Distributed Inexact Newton Method with Adaptive Step Sizes

Distributed Zeroth-Order Optimization: Convergence Rates That Match Centralized Counterpart

A Communication-Efficient Decentralized Newton's Method with Provably Faster Convergence

On Convergence of Distributed Approximate Newton Methods: Globalization, Sharper Bounds and Beyond

Variance-Reduced Stochastic Quasi-Newton Methods for Decentralized Learning: Part II

Improving the Sample and Communication Complexity for Decentralized Non-Convex Optimization: A Joint Gradient Estimation and Tracking Approach

A(DP)$^2$2SGD: Asynchronous Decentralized Parallel Stochastic Gradient Descent with Differential Privacy

On the Divergence of Decentralized Non-Convex Optimization

A(DP)$^2$SGD: Asynchronous Decentralized Parallel Stochastic Gradient Descent with Differential Privacy

Decentralized projected Riemannian gradient method for smooth optimization on compact submanifolds

Distributed Adaptive Newton Methods with Globally Superlinear Convergence

Distributed adaptive Newton methods with global superlinear convergence

Decentralized Stochastic Subgradient Methods for Nonsmooth Nonconvex Optimization

Decentralized Approximate Newton Methods for Convex Optimization on Networked Systems

Variance-Reduced Stochastic Quasi-Newton Methods for Decentralized Learning: Part I

Communication-Efficient Topologies for Decentralized Learning with O(1) Consensus Rate

Decentralized Local Updates with Dual-Slow Estimation and Momentum-Based Variance-Reduction for Non-Convex Optimization

Decentralized Stochastic Gradient Tracking for Non-convex Empirical Risk Minimization

On the Convergence of Decentralized Federated Learning Under Imperfect Information Sharing