ESync: Accelerating Intra-Domain Federated Learning in Heterogeneous Data Centers

Zonghang Li,Huaman Zhou,Tianyao Zhou,Hongfang Yu,Zenglin Xu,Gang Sun
DOI: https://doi.org/10.1109/tsc.2020.3044043
IF: 11.019
2022-01-01
IEEE Transactions on Services Computing
Abstract:Federated Learning (FL) serves privacy-preserving collaborative learning among multiple isolated parties, while retaining their privacy data locally. Cross-device and cross-silo FL have achieved great success in cross-domain applications, in which the scarce communication resource is the primary bottleneck. Driven by the need to combine heterogeneous machines from different parties to build a shared data center, we found intra-domain FL, a new type of FL in which isolated parties collaborate in the shared data center, and strong computational heterogeneity becomes the primary bottleneck. To mitigate the training inefficiency caused by stragglers, this article proposes an efficient synchronization algorithm ESync, which allows parties to train different iterations locally under the coordination of a novel scheduler State Server. We give the boundaries of weight divergence and optimality gap of ESync, and analyze the trade-off between convergence accuracy and communication efficiency. Extensive experiments are conducted to compare ESync with SSGD, ASGD, DC-ASGD, FedAvg, FedAsync, TiFL, and FedDrop under strong computational heterogeneity. Numerical results show that ESync achieves great speed up without loss of accuracy, and therefore demonstrate the effectiveness of ESync in both training efficiency and converged accuracy.
What problem does this paper attempt to address?