FedComp: A Federated Learning Compression Framework for Resource-Constrained Edge Computing Devices

Donglei Wu,Weihao Yang,Haoyu Jin,Xiangyu Zou,Wen Xia,Binxing Fang
DOI: https://doi.org/10.1109/tcad.2023.3307459
IF: 2.9
2024-01-01
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Abstract:Top-K sparsification-based compression techniques are popular and powerful for reducing communication costs in federated learning (FL). However, existing Top-K sparsification-based compression methods suffer from two critical issues that severely hinder their implementation, particularly in the context of FL, which often involves a vast number of resource-constrained devices: 1) the low compressibility of the Top-K parameter’s indexes significantly limits the overall compression ratio (CR) and 2) the residual accumulation techniques used to maintain the model quality consume huge memory resources. To address these issues, we propose a novel FL compression framework, named FedComp, for deep neural networks (DNNs). FedComp achieves a higher communication CR while maintaining comparable model quality at low memory cost. Specifically, FedComp incorporates the following three key components: 1) a tensor-wise index-sharing mechanism that greatly reduces the index proportion by sharing one index among multiple elements of the tensor; 2) a fine-grained parameters packing strategy that reduces the transmission of duplicate value and index by considering their properties, thereby further reducing the overall communication cost; and 3) a residual compressor that significantly reduces memory cost by enhancing the compressibility of floating-point residuals and achieving a high CR with a lossless encoding scheme. Experiments on mainstream machine learning (ML) tasks with different DNN structures and datasets demonstrate that our proposed FedComp outperforms the state-of-the-art FL compression algorithms by achieving a higher communication CR of up to $28.5\times $ while reducing memory costs by $21.04\times $ – $50.59\times $ on the local residual model, without degrading FL training performance.
What problem does this paper attempt to address?