Abstract:Decentralized federated learning (DFL) realizes cooperative model training among connected clients without relying on a central server, thereby mitigating communication bottlenecks and eliminating the single-point failure issue present in centralized federated learning (CFL). Most existing work on DFL focuses on supervised learning, assuming each client possesses sufficient labeled data for local training. However, in real-world applications, much of the data is unlabeled. We address this by considering a challenging yet practical semisupervised learning (SSL) scenario in DFL, where clients may have varying data sources: some with few labeled samples, some with purely unlabeled data, and others with both. In this work, we propose SemiDFL, the first semi-supervised DFL method that enhances DFL performance in SSL scenarios by establishing a consensus in both data and model spaces. Specifically, we utilize neighborhood information to improve the quality of pseudo-labeling, which is crucial for effectively leveraging unlabeled data. We then design a consensusbased diffusion model to generate synthesized data, which is used in combination with pseudo-labeled data to create mixed datasets. Additionally, we develop an adaptive aggregation method that leverages the model accuracy of synthesized data to further enhance SemiDFL performance. Through extensive experimentation, we demonstrate the remarkable performance superiority of the proposed DFL-Semi method over existing CFL and DFL schemes in both IID and non-IID SSL scenarios.

What problem does this paper attempt to address?

The problems that this paper attempts to solve are: In Decentralized Federated Learning (DFL), how to effectively handle the situation where clients have limited labeled data and unlabeled data, especially in a highly non - independent and identically distributed (non - IID) data environment. Specifically: 1. **Limitations of existing DFL methods**: - Most of the existing DFL research mainly focuses on supervised learning, assuming that each client has sufficient labeled data for local training. - However, in practical applications, a large amount of data is often unlabeled, which makes the existing DFL methods ineffective in handling these data. 2. **The need to introduce Semi - Supervised Learning (SSL)**: - Semi - supervised learning enhances model performance by using unlabeled data, especially suitable for the situation where labeled data is scarce. - However, applying SSL to DFL is challenging because DFL lacks a central coordinating server, and the data sources and distributions of clients can be very diverse. 3. **Specific problem description**: - How to effectively combine labeled data and unlabeled data for model training in the DFL framework? - How to ensure model consistency among various clients in the case of highly non - IID data distribution? To solve these problems, the paper proposes SemiDFL, a new semi - supervised decentralized federated learning paradigm, aiming to improve the performance of DFL in semi - supervised scenarios by establishing consensus in the model space and data space. Specific methods include: - **Neighborhood Pseudo - Labeling**: Improve the quality of pseudo - labels by combining the information of neighborhood classifiers. - **Consensus MixUp**: Use the generated synthetic data to mix with labeled and pseudo - labeled data to form a consensus data space. - **Adaptive Aggregation**: Dynamically adjust the aggregation weights according to the performance of each classifier on the generated data to optimize the model aggregation process. Through these innovative methods, SemiDFL can handle diverse semi - supervised DFL tasks under different data settings and shows excellent performance and robustness.

SemiDFL: A Semi-Supervised Paradigm for Decentralized Federated Learning

Exploring One-shot Semi-supervised Federated Learning with A Pre-trained Diffusion Model

SemiSFL: Split Federated Learning on Unlabeled and Non-IID Data

Exploring One-Shot Semi-supervised Federated Learning with Pre-trained Diffusion Models

Decentralized Federated Learning: A Survey and Perspective

Efficient Semi-Supervised Federated Learning for Heterogeneous Participants

DFML: Decentralized Federated Mutual Learning

(FL)$^2$: Overcoming Few Labels in Federated Semi-Supervised Learning

Federated Semi-Supervised Learning with Class Distribution Mismatch

FedSiam-DA: Dual-aggregated Federated Learning Via Siamese Network for Non-Iid Data

Decentralized Sporadic Federated Learning: A Unified Algorithmic Framework with Convergence Guarantees

Improving the Model Consistency of Decentralized Federated Learning

Enhancing Federated Learning with In-Cloud Unlabeled Data

Clients Help Clients: Alternating Collaboration for Semi-Supervised Federated Learning

Semi-Supervised Decentralized Machine Learning with Device-to-Device Cooperation

Combating Data Imbalances in Federated Semi-supervised Learning with Dual Regulators

Decentralized Federated Learning: Balancing Communication and Computing Costs

Enhancing Federated Learning with Server-Side Unlabeled Data by Adaptive Client and Data Selection

Divergence-aware Federated Self-Supervised Learning

Confederated Learning: Federated Learning with Decentralized Edge Servers

GDST: Global Distillation Self-Training for Semi-Supervised Federated Learning