Distributed Stochastic Optimization with Random Communication and Computational Delays: Optimal Policies and Performance Analysis

Siyuan Yu,Wei Chen,H. Vincent Poor
DOI: https://doi.org/10.1109/icc51166.2024.10622795
2024-01-01
Abstract:Distributed stochastic optimization has attracted considerable attention due to its potential of scaling the computational resources, reducing the training time, and helping protect user privacy in decentralized machine learning. However, the staggers and limited bandwidth may induce random computational and communication delays, thereby severely hindering the optimization or learning process. As a result, we are interested in the optimal policies and their performance analysis for latency-aware distributed Stochastic Gradient Descent (SGD). To understand the effect of staleness and error of gradients in distributed optimization, both of which may determine the convergence time, we present a unified framework based on the stochastic delay differential equation to characterize the random convergence time. It is interestingly found that the average convergence time is much more sensitive to the gradient staleness rather than its error. To provide further insights, we show that the time cost of fully asynchronous SGD is approximately determined by the product of the gradient staleness and the 2-norm of the Hessian matrix of the objective function. Moreover, small staleness may slightly accelerate the SGD, while large staleness will result in its divergence.
What problem does this paper attempt to address?