Federated Learning Model Training Method Based on Data Features Perception Aggregation
Zeng Yan,Yan ZhongYi,Zhang JiLin,Zhao NaiLiang,Ren YongJian,Wan Jian,Yu Jun
DOI: https://doi.org/10.1109/vtc2021-fall52928.2021.9625291
2021-01-01
Abstract:The rapidly expanding number of Internet of Things (IoT) devices is generating huge quantities of data, but public concern over data privacy means users are apprehensive to send data to a central server for machine learning purposes. Federated learning is an emerging concept, which allows edge devices to collaboratively learn and share models, while keeping training data on devices. Federated learning decouples “model training” and “direct access to original training data”. But, in the IoT where the wireless network resource is constrained, the key problem of federated learning is the communication overhead for parameter synchronization, which wastes bandwidth, increases training time, and even impacts the model accuracy. Moreover, the IoT devices collect data from different users, so the distribution over devices can be highly non-independent identically distributed (non-IID), which results in the variation of feature distribution and label distribution. As a result, the test accuracy of the federated model is reduced, and the communication cost of training the federated model is increased. In this paper, we propose the FedCC algorithm, to improve the accuracy of the federated model in the non-IID scenario. The FedCC method constructs client groups by mining data similarity, and selects one model of every client group to upload to the cloud server for model aggregation. Our experiments show that FedCC not only outperforms popular state-of-the-art federated learning algorithms on CNN and MLP architectures trained on MNIST and CIFAR-10 datasets, but also reduces the overall communication cost.