Abstract:Communication overhead is a significant bottleneck in federated learning (FL), which has been exaggerated with the increasing size of AI models. In this paper, we propose FedRDMA, a communication-efficient cross-silo FL system that integrates RDMA into the FL communication protocol. To overcome the limitations of RDMA in wide-area networks (WANs), FedRDMA divides the updated model into chunks and designs a series of optimization techniques to improve the efficiency and robustness of RDMA-based communication. We implement FedRDMA atop the industrial federated learning framework and evaluate it on a real-world cross-silo FL scenario. The experimental results show that \sys can achieve up to 3.8$\times$ speedup in communication efficiency compared to traditional TCP/IP-based FL systems.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the communication overhead problem in Federated Learning (FL), especially in the training of large - scale language models (LLMs) across institutions (cross - silo). As the scale of AI models continues to increase, communication overhead has become a significant bottleneck in Federated Learning, especially in the wide - area network (WANs) environment. Even with high bandwidth, this problem is still very prominent. For example, in the case of using two NVIDIA A800 80G GPUs and 10Gbps bandwidth for full - tuning of the GPT - 2 model, it still takes 45.9 seconds to transmit the model weights per round, accounting for more than 44.97% of the total Federated Learning time. To overcome this challenge, the authors propose FedRDMA, an efficient cross - institutional Federated Learning system based on Remote Direct Memory Access (RDMA) technology. FedRDMA divides the updated model into small chunks and designs a series of optimization techniques to improve the efficiency and robustness of RDMA communication, thereby achieving more efficient model parameter exchange in the WAN environment. Experimental results show that compared with the traditional TCP/IP - based Federated Learning system, FedRDMA can increase the communication efficiency up to 3.8 times. Specifically, the main contributions of the paper are as follows: - Through preliminary experiments, it is shown that even with high - bandwidth and computing resources, cross - institutional Federated Learning still faces high communication overhead problems. - Propose FedRDMA, an efficient cross - institutional Federated Learning system that adopts a chunked RDMA transmission method and combines a series of optimization techniques. - Implement FedRDMA and conduct extensive experiments on the industrial - level Federated Learning framework FATE, verifying that it can reduce the communication time by up to 3.8 times. In addition, the paper also explores the influence of different hyper - parameters on the performance of FedRDMA and how to combine it with the Parameter - Efficient Fine - Tuning (PEFT) method to further improve the communication efficiency. Overall, FedRDMA aims to solve the communication bottleneck in the WAN environment by using RDMA technology, thereby accelerating the cross - institutional Federated Learning of large - scale language models.

FedRDMA: Communication-Efficient Cross-Silo Federated LLM via Chunked RDMA Transmission

A Communication Efficient Vertical Federated Learning Framework.

Federated Learning with Additional Mechanisms on Clients to Reduce Communication Costs

FedHe: Heterogeneous Models and Communication-Efficient Federated Learning

FedCross: Towards Accurate Federated Learning via Multi-Model Cross-Aggregation

FedStar: Efficient Federated Learning on Heterogeneous Communication Networks

Efficient and Less Centralized Federated Learning

FedScalar: A Communication efficient Federated Learning

Communication-Efficient Federated Learning in Channel Constrained Internet of Things

AsyFed: Accelerated Federated Learning with Asynchronous Communication Mechanism

EcoFed: Efficient Communication for DNN Partitioning-Based Federated Learning

FedDQ: A communication-efficient federated learning approach for Internet of Vehicles

Communication-Efficient Model Aggregation with Layer Divergence Feedback in Federated Learning

Communication-Efficient Federated Distillation with Active Data Sampling

Computation and Communication Efficient Federated Learning With Adaptive Model Pruning

Adaptive Block-Wise Regularization and Knowledge Distillation for Enhancing Federated Learning

Flexible LAN-WAN Orchestration for Communication Efficient Federated Learning over Large-Scale Mobile Devices

Accelerating Federated Learning with Cluster Construction and Hierarchical Aggregation.

FedDCT: A Dynamic Cross-Tier Federated Learning Framework in Wireless Networks

Bandwidth-Aware and Overlap-Weighted Compression for Communication-Efficient Federated Learning

FedLion: Faster Adaptive Federated Optimization with Fewer Communication