BIRD+: Design of a Lightweight Communication Compressor for Resource-Constrained Distribution Learning Platforms

Donglei Wu,Weihao Yang,Xiangyu Zou,Hao Feng,Dingwen Tao,Shiyi Li,Wen Xia,Binxing Fang
DOI: https://doi.org/10.1109/tpds.2024.3447221
IF: 5.3
2024-09-27
IEEE Transactions on Parallel and Distributed Systems
Abstract:The Top-K sparsification-based compression framework is extensively explored for reducing communication costs in distributed learning. However, we identified several issues with existing Top-K sparsification-based compression methods: (i) The limited compressibility of the Top-K parameter's indexes critically restricts the overall communication compression ratio; (ii) Several time-consuming compression operations significantly offset the benefits of communication compression; (iii) The use of error feedback techniques to maintain model quality results in a high memory footprint consumption. To solve these issues, we propose BIRD, a lightweight tensor-wise Bi-Random sampling strategy with an expectation invariance property. Specifically, BIRD applies a tensor-wise index sharing mechanism that reduces the index proportion by allowing multiple tensor elements to share a single index, thus improving the overall compression ratio. Additionally, BIRD replaces the time-consuming Top-K sorting with a faster Bi-Random sampling strategy based on the aforementioned index sharing mechanism, significantly reducing compression overheads; Moreover, BIRD establishes an expectation invariance property into the Bi-Random sampling to ensure an approximate unbiased representation for the L1-norm of the sampled tensors, effectively maintaining the model quality without incurring extra memory costs. We further optimize BIRD to BIRD+ by introducing the uniform distribution-based sampling and Gamma correction on the tensor-wise sampling process, achieving a more flexibly adjustment of the sparsity with better convergence performance. Experimental evaluations across multiple conventional distributed learning tasks demonstrate that compared to state-of-the-art approaches, BIRD+ achieves higher communication compression ratios up to 36.2 × and higher computation throughput up to 149.6 × while maintaining the model quality without incurring extra memory costs.
computer science, theory & methods,engineering, electrical & electronic
What problem does this paper attempt to address?