Data Resampling for Federated Learning with Non-IID Labels

Zhenghang,Yilun Jin,Ren,Zhikai Hu,X. Chu,Zhenheng Tang,Yiu-ming Cheung,S. Shi
Abstract:Recently, federated learning has received increasing attention from academe and industry, since it makes training models with decentralized data possible. However, most existing federated learning approaches suffer from Non-Independent and Identi-cally data distribution in clients. Observing that each client has an imbalanced label distribution in many federated learning scenarios, we examine the effects of combining imbalanced learning techniques with federated learning. Through comprehensive experiments, we obtain the following findings: (1) By data resampling, the label sampling probabilities are made more similar across clients, which leads to faster convergence; (2) Imbalanced data resampling results in final accuracy decreasing on local dataset. Based on these two key findings, we propose a simple but effective data resampling strategy named Imbalanced Weight Decay Sampling (IWDS) that dynamically regulates the sampling probability of labels, remarkably accelerating the training process. The effectiveness of IWDS has been verified on several modern federated learning algorithms such as FedAvg, FedProx, and FedNova.
Computer Science
What problem does this paper attempt to address?