UniFL: Enabling Loss-tolerant Transmission in Federated Learning

Zixuan Chen,Yifan Ruan,Sen Liu,Yang Xu
DOI: https://doi.org/10.1145/3663408.3663432
2024-01-01
Abstract:As Distributed Deep Learning (DDL) gains prominence, network constraints have emerged as a critical bottleneck impacting DDL performance. While state-of-the-art loss-tolerant (LT) transmission protocols enhance DDL efficiency, their application in federated learning (FL) environments is hindered by several challenges: (1) LT protocols necessitate client-side modifications, impractical in FL settings; (2) maintaining LT protocol transparency to senders compromises congestion control integrity; (3) LT protocols disrupt stream cipher, which is widely utilized in FL. To address these hurdles, this paper introduces UniFL, an innovative LT protocol tailored for FL applications. UniFL seamlessly integrates with FL architectures by preserving congestion control via a specialized speed limiter and adopting an advanced encryption technique that withstands packet loss, ensuring data integrity. UniFL is implemented within the NS3 for simulation evaluation. UniFL's efficacy is evaluated across diverse models and datasets, demonstrating substantial performance enhancements in FL operations. In detail, UniFL can bring up to 40x speedup than the original FL with widely used congestion control algorithms and achieves throughput close to the state-of-the-art LT while being transparent to the workers.
What problem does this paper attempt to address?