Communication Efficient Decentralized Training with Multiple Local Updates

Li Xiang,Yang Wenhao,Wang Shusen,Zhang Zhihua
2019-01-01
Abstract: Communication efficiency plays a significant role in decentralized optimization, especially when the data is highly non-identically distributed. In this paper, we propose a novel algorithm that we call Periodic Decentralized SGD (PD-SGD), to reduce the communication cost in a decentralized heterogeneous network. PD-SGD alternates between multiple local updates and multiple decentralized communications, making communication more flexible and controllable. We theoretically prove PD-SGD convergence at speed $O(\frac{1}{\sqrt{nT}})$ under the setting of stochastic non-convex optimization and non-i.i.d. data where $n$ is the number of worker nodes. We also propose a novel decay strategy which periodically shrinks the length of local updates. PD-SGD equipped with this strategy can better balance the communication-convergence trade-off both theoretically and empirically.
What problem does this paper attempt to address?