Abstract:Federated learning (FL) is important for privacy-preserving services by training models without collecting raw user data. Most FL algorithms assume all data is annotated, which is impractical due to the high cost of labeling data in real applications. To alleviate the reliance on labeled data, semi-supervised federated learning (SSFL) has been proposed to utilize unlabeled data on clients to improve model performance. However, most existing methods either have privacy issues which share models trained on other clients, or generate pseudo-labels for unlabeled local datasets with the global model, which is usually biased towards the global data distribution. The latter may lead to sub-optimal accuracy of pseudo-labels, due to the gap between the local data distribution and the global model, especially in non-IID settings. In this paper, we propose a semi-supervised heterogeneous federated learning method with local knowledge enhancement, called FedLoKe, which aims to train an accurate global model from both labeled and unlabeled local data with non-IID distributions. Specifically, in FedLoKe, the server maintains a global model to capture global data distribution, and each client learns a local model to capture local data distribution. Since the distribution captured by the local model is aligned with the local data distribution, we utilize it to generate high-accuracy pseudo-labels of the unlabeled dataset for global model training. To prevent the local model from severely overfitting local labeled data, we further use the exponential moving average and apply the global model to generate pseudo-labels for local modeling training. Experiments on four datasets show the effectiveness of FedLoKe. Our code is available at: https://github.com/zcfinal/FedLoKe.

Enhancing Federated Learning Efficiency with Generative Model-Based Data Augmentation for Non-IID Data

GFL: Federated Learning on Non-IID data via Privacy-preserving Synthetic data

Optimizing Federated Learning on Non-IID Data Using Local Shapley Value.

Federated Learning Empowered by Generative Content

Federated Learning with GAN-based Data Synthesis for Non-IID Clients.

Federated Learning with Data-Agnostic Distribution Fusion

Data-free knowledge distillation via generator-free data generation for Non-IID federated learning

A Distributed Generative Adversarial Network for Data Augmentation under Vertical Federated Learning

Advocating for the Silent: Enhancing Federated Generalization for Non-Participating Clients

Federated Synthetic Data Generation with Differential Privacy

FedAA: Using Non-sensitive Modalities to Improve Federated Learning while Preserving Image Privacy

Feature Matching Data Synthesis for Non-IID Federated Learning

Data Augmentation Based Federated Learning

A Simple Data Augmentation for Feature Distribution Skewed Federated Learning

Generative AI-Powered Plugin for Robust Federated Learning in Heterogeneous IoT Networks

Dataset Distillation-based Hybrid Federated Learning on Non-IID Data

Non-IID always Bad? Semi-Supervised Heterogeneous Federated Learning with Local Knowledge Enhancement

Federated Generative Learning with Foundation Models

PerFED-GAN: Personalized Federated Learning via Generative Adversarial Networks

Joint Local Relational Augmentation and Global Nash Equilibrium for Federated Learning with Non-IID Data