Unbiased Compression Saves Communication in Distributed Optimization: When and How Much?

Yutong He,Xinmeng Huang,Kun Yuan
2024-01-11
Abstract:Communication compression is a common technique in distributed optimization that can alleviate communication overhead by transmitting compressed gradients and model parameters. However, compression can introduce information distortion, which slows down convergence and incurs more communication rounds to achieve desired solutions. Given the trade-off between lower per-round communication costs and additional rounds of communication, it is unclear whether communication compression reduces the total communication cost. This paper explores the conditions under which unbiased compression, a widely used form of compression, can reduce the total communication cost, as well as the extent to which it can do so. To this end, we present the first theoretical formulation for characterizing the total communication cost in distributed optimization with communication compression. We demonstrate that unbiased compression alone does not necessarily save the total communication cost, but this outcome can be achieved if the compressors used by all workers are further assumed independent. We establish lower bounds on the communication rounds required by algorithms using independent unbiased compressors to minimize smooth convex functions and show that these lower bounds are tight by refining the analysis for ADIANA. Our results reveal that using independent unbiased compression can reduce the total communication cost by a factor of up to $\Theta(\sqrt{\min\{n, \kappa\}})$ when all local smoothness constants are constrained by a common upper bound, where $n$ is the number of workers and $\kappa$ is the condition number of the functions being minimized. These theoretical findings are supported by experimental results.
Machine Learning,Distributed, Parallel, and Cluster Computing,Optimization and Control
What problem does this paper attempt to address?
The problems that this paper attempts to solve are: **In distributed optimization, can unbiased compression reduce the total communication cost under certain conditions? Under what conditions can this goal be achieved? And by how much can it be reduced at most?** Specifically, the paper explores the following two core questions: 1. **Can using only unbiased compression reduce the total communication cost?** - Unbiased compression is a common compression technique. It reduces communication overhead by transmitting compressed gradients and model parameters. However, compression will introduce information distortion, which slows down the convergence speed and increases the number of communication rounds required to reach the expected solution. Therefore, it is not clear whether unbiased compression can actually reduce the total communication cost. - Through theoretical analysis, the paper shows that **relying solely on unbiased compression does not ensure a reduction in the total communication cost**, because the reduction in communication cost per round is completely offset by the additional communication rounds. 2. **Under what additional conditions can unbiased compression significantly reduce the total communication cost? And by how much can it be reduced at most?** - The paper further studies the situation when the compressors used by all worker nodes are independent of each other. This independence assumption can produce an "error - cancellation" effect, making the compressed vectors more accurate and thus reducing the additional communication rounds. - Theoretical results show that, under the condition of satisfying the independence assumption, unbiased compression can reduce the total communication cost by at most \(\Theta(\sqrt{\min\{n, \kappa\}})\) times, where \(n\) is the number of worker nodes and \(\kappa\) is the condition number of the function. ### Main contributions - **Theoretical framework**: The paper first proposes a theoretical framework for characterizing the total communication cost in distributed optimization and proves that relying solely on unbiased compression cannot reduce the total communication cost. - **Lower - bound analysis**: It establishes the lower bound of the convergence complexity of distributed algorithms using independent unbiased compressors and verifies the tightness of these lower bounds by improving the analysis of the ADIANA algorithm. - **Experimental verification**: It verifies the theoretical results through experiments and shows the effectiveness of independent unbiased compression in practical applications. ### Key formulas - **Definition of unbiased compression**: \[ E[C_i(x)] = x, \quad E[\|C_i(x) - x\|^2] \leq \omega \|x\|^2 \] where \(C_i(x)\) is the compressor of the \(i\) - th worker node and \(\omega\) is a fixed parameter representing the degree of information distortion. - **Total communication cost (TCC)**: \[ TCC_\epsilon(A, \{(f_i, C_i)\}_{i = 1}^n):=\text{per - round cost}(\{C_i\}_{i = 1}^n)\times T_\epsilon(A, \{(f_i, C_i)\}_{i = 1}^n) \] where \(\text{per - round cost}\) represents the cost of each round of communication and \(T_\epsilon\) represents the number of communication rounds required to reach an \(\epsilon\)-accurate solution. - **Variance reduction of independent unbiased compression**: \[ E\left[\left\|\frac{1}{n}\sum_{i = 1}^n C_i(x)-x\right\|^2\right]\leq\frac{\omega}{n}\|x\|^2 \] This shows that independent unbiased compression can reduce the variance to \(\frac{1}{n}\) times that of a single compressor. Through these theoretical and experimental analyses, the paper provides important insights into communication compression techniques in distributed optimization and points out the directions for future research.