Federated learning with workload-aware client scheduling in heterogeneous systems

Li Li,Duo Liu,Moming Duan,Yu Zhang,Ao Ren,Xianzhang Chen,Yujuan Tan,Chengliang Wang
DOI: https://doi.org/10.1016/j.neunet.2022.07.030
Abstract:Federated Learning (FL) is a novel distributed machine learning, which allows thousands of edge devices to train models locally without uploading data to the central server. Since devices in real federated settings are resource-constrained, FL encounters systems heterogeneity, which causes considerable stragglers and incurs significant accuracy degradation. To tackle the challenges of systems heterogeneity and improve the robustness of the global model, we propose a novel adaptive federated framework in this paper. Specifically, we propose FedSAE that leverages the workload completion history of clients to adaptively predict the affordable training workload for each device. Consequently, FedSAE can significantly reduce stragglers in highly heterogeneous systems. We incorporate Active Learning into FedSAE to dynamically schedule participants. The server evaluates the devices' training value based on their training loss in each round, and larger-value clients are selected with a higher probability. As a result, the model convergence is accelerated. Furthermore, we propose q-FedSAE that combines FedSAE and q-FFL to improve global fairness in highly heterogeneous systems. The evaluations conducted in a highly heterogeneous system demonstrate that both FedSAE and q-FedSAE converge faster than FedAvg. In particular, FedSAE outperforms FedAvg across multiple federated datasets - FedSAE improves testing accuracy by 22.19% and reduces stragglers by 90.69% on average. Moreover, holding the same accuracy as FedSAE, q-FedSAE allows for more robust convergence and fairer model performance than q-FedAvg, FedSAE.
What problem does this paper attempt to address?