A New Analysis of Variance Reduced Stochastic Proximal Methods for Composite Optimization with Serial and Asynchronous Realizations.

Yue Yu,Longbo Huang
2018-01-01
Abstract:Modern distributed training of machine learning models suffers from high communication overhead for synchronizing stochastic gradients and model parameters. In this paper, to reduce the communication complexity, we propose \emph{double quantization}, a general scheme for quantizing both model parameters and gradients. Three communication-efficient algorithms are proposed under this general scheme. Specifically, (i) we propose a low-precision algorithm AsyLPG with asynchronous parallelism, (ii) we explore integrating gradient sparsification with double quantization and develop Sparse-AsyLPG, (iii) we show that double quantization can also be accelerated by momentum technique and design accelerated AsyLPG. We establish rigorous performance guarantees for the algorithms, and conduct experiments on a multi-server test-bed to demonstrate that our algorithms can effectively save transmitted bits without performance degradation.
What problem does this paper attempt to address?