AC-SGD: Adaptively Compressed SGD for Communication-Efficient Distributed Learning
Guangfeng Yan,Tan Li,Shao-Lun Huang,Tian Lan,Linqi Song
DOI: https://doi.org/10.1109/jsac.2022.3192050
IF: 16.4
2022-01-01
IEEE Journal on Selected Areas in Communications
Abstract:Gradient compression (e.g., gradient quantization and gradient sparsification) is a core technique in reducing communication costs in distributed learning systems. The recent trend of gradient compression is to use a varying number of bits across iterations, however, relying on empirical observations or engineering heuristics without a systematic treatment and analysis. To the best of our knowledge, a general dynamic gradient compression that leverages both quantization and sparsification techniques is still far from understanding. This paper proposes a novel Adaptively-Compressed Stochastic Gradient Descent (AC-SGD) strategy to adjust the number of quantization bits and the sparsification size with respect to the norm of gradients, the communication budget, and the remaining number of iterations. In particular, we derive an upper bound, tight in some cases, of the convergence error for arbitrary dynamic compression strategy. Then we consider communication budget constraints and propose an optimization formulation - denoted as the Adaptive Compression Problem (ACP) - for minimizing the deep model’s convergence error under such constraints. By solving the ACP, we obtain an enhanced compression algorithm that significantly improves model accuracy under given communication budget constraints. Finally, through extensive experiments on computer vision and natural language processing tasks on MNIST, CIFAR-10, CIFAR-100 and AG-News datasets, respectively, we demonstrate that our compression scheme significantly outperforms the state-of-the-art gradient compression methods in terms of mitigating communication costs.