Abstract:We are investigating the distributed optimization problem, where a network of nodes works together to minimize a global objective that is a finite sum of their stored local functions. Since nodes exchange optimization parameters through the wireless network, large-scale training models can create communication bottlenecks, resulting in slower training times. To address this issue, CHOCO-SGD was proposed, which allows compressing information with arbitrary precision without reducing the convergence rate for strongly convex objective functions. Nevertheless, most convex functions are not strongly convex (such as logistic regression or Lasso), which raises the question of whether this algorithm can be applied to non-strongly convex functions. In this paper, we provide the first theoretical analysis of the convergence rate of CHOCO-SGD on non-strongly convex objectives. We derive a sufficient condition, which limits the fidelity of compression, to guarantee convergence. Moreover, our analysis demonstrates that within the fidelity threshold, this algorithm can significantly reduce transmission burden while maintaining the same convergence rate order as its no-compression equivalent. Numerical experiments further validate the theoretical findings by demonstrating that CHOCO-SGD improves communication efficiency and keeps the same convergence rate order simultaneously. And experiments also show that the algorithm fails to converge with low compression fidelity and in time-varying topologies. Overall, our study offers valuable insights into the potential applicability of CHOCO-SGD for non-strongly convex objectives. Additionally, we provide practical guidelines for researchers seeking to utilize this algorithm in real-world scenarios.

Decentralized Deep Learning with Arbitrary Communication Compression

PowerGossip: Practical Low-Rank Communication Compression in Decentralized Deep Learning

Communication compression for decentralized training

Sparse Communication for Training Deep Networks

Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training.

Distributed Stochastic Optimization with Compression for Non-Strongly Convex Objectives

AdaGossip: Adaptive Consensus Step-size for Decentralized Deep Learning with Communication Compression

Compressed Communication for Distributed Training: Adaptive Methods and System

Communication Compression Techniques in Distributed Deep Learning: A Survey

Compressed and Sparse Models for Non-Convex Decentralized Learning

Faster Distributed Deep Net Training: Computation and Communication Decoupled Stochastic Gradient Descent

Evaluation and Optimization of Gradient Compression for Distributed Deep Learning

Stochastic gradient compression for federated learning over wireless network

Sparse Gradient Compression For Distributed Sgd

Flexible Communication for Optimal Distributed Learning over Unpredictable Networks

A Distributed SGD Algorithm with Global Sketching for Deep Learning Training Acceleration

THC: Accelerating Distributed Deep Learning Using Tensor Homomorphic Compression

AC-SGD: Adaptively Compressed SGD for Communication-Efficient Distributed Learning

Accelerating Distributed Deep Learning using Lossless Homomorphic Compression

DoubleSqueeze: Parallel Stochastic Gradient Descent with Double-Pass Error-Compensated Compression

$\texttt{DeepSqueeze}$: Decentralization Meets Error-Compensated Compression