Knowledge-Enhanced Semi-Supervised Federated Learning for Aggregating Heterogeneous Lightweight Clients in IoT

Jiaqi Wang,Shenglai Zeng,Zewei Long,Yaqing Wang,Houping Xiao,Fenglong Ma
2023-03-05
Abstract:Federated learning (FL) enables multiple clients to train models collaboratively without sharing local data, which has achieved promising results in different areas, including the Internet of Things (IoT). However, end IoT devices do not have abilities to automatically annotate their collected data, which leads to the label shortage issue at the client side. To collaboratively train an FL model, we can only use a small number of labeled data stored on the server. This is a new yet practical scenario in federated learning, i.e., labels-at-server semi-supervised federated learning (SemiFL). Although several SemiFL approaches have been proposed recently, none of them can focus on the personalization issue in their model design. IoT environments make SemiFL more challenging, as we need to take device computational constraints and communication cost into consideration simultaneously. To tackle these new challenges together, we propose a novel SemiFL framework named pFedKnow. pFedKnow generates lightweight personalized client models via neural network pruning techniques to reduce communication cost. Moreover, it incorporates pretrained large models as prior knowledge to guide the aggregation of personalized client models and further enhance the framework performance. Experiment results on both image and text datasets show that the proposed pFedKnow outperforms state-of-the-art baselines as well as reducing considerable communication cost. The source code of the proposed pFedKnow is available at <a class="link-external link-https" href="https://github.com/JackqqWang/pfedknow/tree/master" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Artificial Intelligence,Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve several key challenges faced by Federated Learning (FL) in the Internet of Things (IoT) environment. Specifically: 1. **Label shortage problem**: In actual IoT applications, end - devices cannot automatically label the data they collect, resulting in a label shortage on the client side. Therefore, in the case where there is a small amount of labeled data on the server, how to effectively train the federated learning model has become a new and practical scenario, namely "Semi - supervised Federated Learning with Labels on the Server" (SemiFL). 2. **Requirement for personalized models**: Existing SemiFL methods mainly focus on developing a general global model, but this single global model may not be able to well represent the uniqueness of each IoT user. Therefore, it is necessary to generate personalized models for each client to adapt to heterogeneous data. 3. **Constraints of resources and communication costs**: Devices in the IoT environment have limited computing resources and network bandwidth is also restricted. Traditional federated learning methods do not fully consider these constraints, so it is necessary to design a method that can reduce communication costs while ensuring performance. To address the above challenges, the authors propose a new framework named pFedKnow. pFedKnow solves the problems in the following ways: - **Lightweight personalized client - side models**: Use neural network pruning techniques to generate lightweight personalized client - side models to reduce communication costs. - **Knowledge - enhanced aggregation**: Introduce pre - trained large models as prior knowledge to guide the aggregation of personalized client - side models, thereby improving the overall performance of the framework. - **Structure - aware collaborative distillation**: Propose a structure - aware collaborative distillation mechanism that can fuse personalized models with different structures on the server side, ensuring model performance while maintaining personalization. Through these innovations, pFedKnow not only outperforms existing baseline methods on image and text datasets, but also significantly reduces communication costs, demonstrating its effectiveness and efficiency in actual IoT applications.