Abstract:Many modern large-scale machine learning problems benefit from decentralized and stochastic optimization. Recent works have shown that utilizing both decentralized computing and local stochastic gradient estimates can outperform state-of-the-art centralized algorithms, in applications involving highly non-convex problems, such as training deep neural networks. In this work, we propose a decentralized stochastic algorithm to deal with certain smooth non-convex problems where there are $m$ nodes in the system, and each node has a large number of samples (denoted as $n$). Differently from the majority of the existing decentralized learning algorithms for either stochastic or finite-sum problems, our focus is given to both reducing the total communication rounds among the nodes, while accessing the minimum number of local data samples. In particular, we propose an algorithm named D-GET (decentralized gradient estimation and tracking), which jointly performs decentralized gradient estimation (which estimates the local gradient using a subset of local samples) and gradient tracking (which tracks the global full gradient using local estimates). We show that, to achieve certain $\epsilon$ stationary solution of the deterministic finite sum problem, the proposed algorithm achieves an $\mathcal{O}(mn^{1/2}\epsilon^{-1})$ sample complexity and an $\mathcal{O}(\epsilon^{-1})$ communication complexity. These bounds significantly improve upon the best existing bounds of $\mathcal{O}(mn\epsilon^{-1})$ and $\mathcal{O}(\epsilon^{-1})$, respectively. Similarly, for online problems, the proposed method achieves an $\mathcal{O}(m \epsilon^{-3/2})$ sample complexity and an $\mathcal{O}(\epsilon^{-1})$ communication complexity, while the best existing bounds are $\mathcal{O}(m\epsilon^{-2})$ and $\mathcal{O}(\epsilon^{-2})$, respectively.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to simultaneously reduce the total number of communication rounds and the number of accesses to local data samples in distributed non - convex optimization. Specifically, the paper focuses on how to design an effective decentralized random algorithm to handle certain smooth non - convex problems in a system with $m$ nodes, where each node has a large number of samples (denoted as $n$). Compared with most of the existing decentralized learning algorithms, the algorithm proposed in this paper not only reduces the total number of communication rounds between nodes, but also minimizes the number of accesses to local data samples. ### Main Contributions 1. **Proposed a new Decentralized Gradient Estimation and Tracking algorithm (D - GET)**: - This algorithm estimates the local gradient using some local samples and tracks the global gradient by using the past local gradient differences, thus achieving efficient decentralized optimization. - For deterministic finite - sum problems, the D - GET algorithm achieves a sample complexity of $O(mn^{1/2}\epsilon^{-1})$ and a communication complexity of $O(\epsilon^{-1})$. - For online problems, the D - GET algorithm achieves a sample complexity of $O(m\epsilon^{-3/2})$ and a communication complexity of $O(\epsilon^{-1})$. 2. **Improved the complexity of existing methods**: - Compared with the existing best complexity, D - GET has significant improvements in both sample complexity and communication complexity. For example, for finite - sum problems, the sample complexity of existing methods is $O(mn\epsilon^{-1})$ and the communication complexity is $O(\epsilon^{-1})$, while for D - GET they are $O(mn^{1/2}\epsilon^{-1})$ and $O(\epsilon^{-1})$ respectively. - For online problems, the sample complexity of existing methods is $O(m\epsilon^{-2})$ and the communication complexity is $O(\epsilon^{-2})$, while for D - GET they are $O(m\epsilon^{-3/2})$ and $O(\epsilon^{-1})$ respectively. 3. **Theoretical analysis**: - The paper analyzes in detail the convergence properties of the D - GET algorithm and proves its superiority in sample complexity and communication complexity. - By introducing two auxiliary variables $v$ and $y$, which are used to estimate the local and global gradients respectively, the D - GET algorithm effectively combines modern variance reduction techniques and decentralized gradient tracking methods. ### Application Scenarios - **Large - scale machine learning tasks**: In highly non - convex problems such as training deep neural networks, the D - GET algorithm can significantly improve the optimization efficiency and reduce communication overhead and sample access times. - **Distributed computing**: In applications that require data privacy protection, enhanced network robustness and improved computational efficiency, the D - GET algorithm provides effective solutions. ### Conclusion By proposing the D - GET algorithm, this paper has made important progress in the field of decentralized non - convex optimization, especially in reducing communication complexity and sample complexity. This provides new ideas and tools for large - scale machine learning and distributed computing.

Improving the Sample and Communication Complexity for Decentralized Non-Convex Optimization: A Joint Gradient Estimation and Tracking Approach

Asynchronous Decentralized Accelerated Stochastic Gradient Descent

Gradient tracking and variance reduction for decentralized optimization and machine learning

Multi-consensus Decentralized Accelerated Gradient Descent

On the Divergence of Decentralized Non-Convex Optimization

Jointly Improving the Sample and Communication Complexities in Decentralized Stochastic Minimax Optimization

Faster Adaptive Decentralized Learning Algorithms

Optimal Gradient Tracking for Decentralized Optimization

Decentralized Stochastic Gradient Descent Ascent for Finite-Sum Minimax Problems

Accelerated Gradient Tracking over Time-varying Graphs for Decentralized Optimization

An Optimal Stochastic Algorithm for Decentralized Nonconvex Finite-sum Optimization

Balancing Communication and Computation in Gradient Tracking Algorithms for Decentralized Optimization

Distributed Adaptive Gradient Algorithm with Gradient Tracking for Stochastic Non-Convex Optimization

A Flexible Gradient Tracking Algorithmic Framework for Decentralized Optimization

Decentralized Stochastic Subgradient Methods for Nonsmooth Nonconvex Optimization

Snap-Shot Decentralized Stochastic Gradient Tracking Methods

A Communication-Efficient Decentralized Newton's Method with Provably Faster Convergence

Communication-efficient algorithms for decentralized and stochastic optimization

A Communication-efficient Linearly Convergent Algorithm with Variance Reduction for Distributed Stochastic Optimization

Online Optimization Perspective on First-Order and Zero-Order Decentralized Nonsmooth Nonconvex Stochastic Optimization