Muhammad I. Qureshi,Ran Xin,Soummya Kar,Usman A. Khan
Abstract:In this paper, we propose Push-SAGA, a decentralized stochastic first-order method for finite-sum minimization over a directed network of nodes. Push-SAGA combines node-level variance reduction to remove the uncertainty caused by stochastic gradients, network-level gradient tracking to address the distributed nature of the data, and push-sum consensus to tackle the challenge of directed communication links. We show that Push-SAGA achieves linear convergence to the exact solution for smooth and strongly convex problems and is thus the first linearly-convergent stochastic algorithm over arbitrary strongly connected directed graphs. We also characterize the regimes in which Push-SAGA achieves a linear speed-up compared to its centralized counterpart and achieves a network-independent convergence rate. We illustrate the behavior and convergence properties of Push-SAGA with the help of numerical experiments on strongly convex and non-convex problems.
Machine Learning,Distributed, Parallel, and Cluster Computing,Multiagent Systems,Systems and Control
What problem does this paper attempt to address?
This paper attempts to solve several key problems encountered in distributed stochastic optimization in directed graph networks, as follows:
1. **Variance problem**: In the Distributed Stochastic Gradient Descent (DSGD) method, the use of stochastic gradients results in a large variance, which affects the convergence and accuracy of the algorithm.
2. **Difference between local and global costs**: The local cost function \( f_i \) on each node is different from the global cost function \( F \), which makes it difficult for each node to reach a consistent optimal solution.
3. **Directed - graph communication challenges**: In a directed graph, the communication links are unidirectional, which makes traditional methods based on symmetric, doubly - stochastic weight matrices no longer applicable.
To solve these problems, the paper proposes the **Push - SAGA** algorithm. This algorithm overcomes the above challenges through the following technical means:
- **Node - level variance reduction**: Use SAGA - based gradient estimators to estimate the local batch gradients of each node, thereby reducing the variance introduced by stochastic gradients.
- **Network - level gradient tracking**: Estimate the global gradient through dynamic average consensus to solve the difference between local and global costs.
- **Push - sum consensus**: Used to handle asymmetric communication links in directed graphs to ensure that nodes can reach an agreement.
### Main contributions
1. **Linear convergence**: Push - SAGA is the first stochastic method that can achieve linear convergence on any strongly - connected directed graph, and is suitable for minimizing smooth and strongly - convex cost functions.
2. **Directionality constant**: Introduce the directionality constant \( \psi\geq1 \), which explicitly quantifies the directionality characteristics of the underlying graph.
3. **Linear acceleration and network - independent convergence**: In big - data scenarios, the complexity of Push - SAGA is \( O(M\log\frac{1}{\epsilon}) \), which is \( n \) times faster than the centralized SAGA, and its convergence rate is independent of network parameters.
4. **Performance improvement**: In big - data scenarios, Push - SAGA outperforms related linearly - convergent methods in terms of jointly depending on the condition number \( \kappa \) and the number of samples \( m \).
Through these improvements, Push - SAGA not only improves the efficiency and accuracy of distributed optimization but also expands the application range of existing methods, especially in the directed - graph network environment.