Abstract:In this paper, we propose Push-SAGA, a decentralized stochastic first-order method for finite-sum minimization over a directed network of nodes. Push-SAGA combines node-level variance reduction to remove the uncertainty caused by stochastic gradients, network-level gradient tracking to address the distributed nature of the data, and push-sum consensus to tackle the challenge of directed communication links. We show that Push-SAGA achieves linear convergence to the exact solution for smooth and strongly convex problems and is thus the first linearly-convergent stochastic algorithm over arbitrary strongly connected directed graphs. We also characterize the regimes in which Push-SAGA achieves a linear speed-up compared to its centralized counterpart and achieves a network-independent convergence rate. We illustrate the behavior and convergence properties of Push-SAGA with the help of numerical experiments on strongly convex and non-convex problems.

What problem does this paper attempt to address?

This paper attempts to solve several key problems encountered in distributed stochastic optimization in directed graph networks, as follows: 1. **Variance problem**: In the Distributed Stochastic Gradient Descent (DSGD) method, the use of stochastic gradients results in a large variance, which affects the convergence and accuracy of the algorithm. 2. **Difference between local and global costs**: The local cost function \( f_i \) on each node is different from the global cost function \( F \), which makes it difficult for each node to reach a consistent optimal solution. 3. **Directed - graph communication challenges**: In a directed graph, the communication links are unidirectional, which makes traditional methods based on symmetric, doubly - stochastic weight matrices no longer applicable. To solve these problems, the paper proposes the **Push - SAGA** algorithm. This algorithm overcomes the above challenges through the following technical means: - **Node - level variance reduction**: Use SAGA - based gradient estimators to estimate the local batch gradients of each node, thereby reducing the variance introduced by stochastic gradients. - **Network - level gradient tracking**: Estimate the global gradient through dynamic average consensus to solve the difference between local and global costs. - **Push - sum consensus**: Used to handle asymmetric communication links in directed graphs to ensure that nodes can reach an agreement. ### Main contributions 1. **Linear convergence**: Push - SAGA is the first stochastic method that can achieve linear convergence on any strongly - connected directed graph, and is suitable for minimizing smooth and strongly - convex cost functions. 2. **Directionality constant**: Introduce the directionality constant \( \psi\geq1 \), which explicitly quantifies the directionality characteristics of the underlying graph. 3. **Linear acceleration and network - independent convergence**: In big - data scenarios, the complexity of Push - SAGA is \( O(M\log\frac{1}{\epsilon}) \), which is \( n \) times faster than the centralized SAGA, and its convergence rate is independent of network parameters. 4. **Performance improvement**: In big - data scenarios, Push - SAGA outperforms related linearly - convergent methods in terms of jointly depending on the condition number \( \kappa \) and the number of samples \( m \). Through these improvements, Push - SAGA not only improves the efficiency and accuracy of distributed optimization but also expands the application range of existing methods, especially in the directed - graph network environment.

Push-SAGA: A decentralized stochastic algorithm with variance reduction over directed graphs

A Push-Pull Gradient Method for Distributed Optimization in Networks.

Provably Accelerated Decentralized Gradient Methods Over Unbalanced Directed Graphs

Distributed SAGA: Maintaining linear convergence rate with limited communication

An Analysis Tool for Push-Sum Based Distributed Optimization

Distributed Stochastic Algorithm for Global Optimization in Networked System

Distributed Algorithms for Composite Optimization: Unified Framework and Convergence Analysis

Can Decentralized Stochastic Minimax Optimization Algorithms Converge Linearly for Finite-Sum Nonconvex-Nonconcave Problems?

An Optimal Stochastic Algorithm for Decentralized Nonconvex Finite-sum Optimization

Distributed Stochastic Optimization under a General Variance Condition

Convergence in High Probability of Distributed Stochastic Gradient Descent Algorithms

Stochastic Gradient Push for Distributed Deep Learning

Decentralized Finite-Sum Optimization over Time-Varying Networks

Fully Stochastic Primal-dual Gradient Algorithm for Non-convex Optimization on Random Graphs

A Unified Theory of Decentralized SGD with Changing Topology and Local Updates

Variance reduction techniques for stochastic proximal point algorithms

D-SPIDER-SFO: A Decentralized Optimization Algorithm with Faster Convergence Rate for Nonconvex Problems

Variance-Reduced Proximal Stochastic Gradient Descent for Non-convex Composite optimization.

Understanding the Influence of Digraphs on Decentralized Optimization: Effective Metrics, Lower Bound, and Optimal Algorithm

Non-Smooth Setting of Stochastic Decentralized Convex Optimization Problem Over Time-Varying Graphs

Byzantine-robust decentralized stochastic optimization with stochastic gradient noise-independent learning error