SemiDFL: A Semi-Supervised Paradigm for Decentralized Federated Learning

Xinyang Liu,Pengchao Han,Xuan Li,Bo Liu
2024-12-18
Abstract:Decentralized federated learning (DFL) realizes cooperative model training among connected clients without relying on a central server, thereby mitigating communication bottlenecks and eliminating the single-point failure issue present in centralized federated learning (CFL). Most existing work on DFL focuses on supervised learning, assuming each client possesses sufficient labeled data for local training. However, in real-world applications, much of the data is unlabeled. We address this by considering a challenging yet practical semisupervised learning (SSL) scenario in DFL, where clients may have varying data sources: some with few labeled samples, some with purely unlabeled data, and others with both. In this work, we propose SemiDFL, the first semi-supervised DFL method that enhances DFL performance in SSL scenarios by establishing a consensus in both data and model spaces. Specifically, we utilize neighborhood information to improve the quality of pseudo-labeling, which is crucial for effectively leveraging unlabeled data. We then design a consensusbased diffusion model to generate synthesized data, which is used in combination with pseudo-labeled data to create mixed datasets. Additionally, we develop an adaptive aggregation method that leverages the model accuracy of synthesized data to further enhance SemiDFL performance. Through extensive experimentation, we demonstrate the remarkable performance superiority of the proposed DFL-Semi method over existing CFL and DFL schemes in both IID and non-IID SSL scenarios.
Machine Learning,Artificial Intelligence,Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The problems that this paper attempts to solve are: In Decentralized Federated Learning (DFL), how to effectively handle the situation where clients have limited labeled data and unlabeled data, especially in a highly non - independent and identically distributed (non - IID) data environment. Specifically: 1. **Limitations of existing DFL methods**: - Most of the existing DFL research mainly focuses on supervised learning, assuming that each client has sufficient labeled data for local training. - However, in practical applications, a large amount of data is often unlabeled, which makes the existing DFL methods ineffective in handling these data. 2. **The need to introduce Semi - Supervised Learning (SSL)**: - Semi - supervised learning enhances model performance by using unlabeled data, especially suitable for the situation where labeled data is scarce. - However, applying SSL to DFL is challenging because DFL lacks a central coordinating server, and the data sources and distributions of clients can be very diverse. 3. **Specific problem description**: - How to effectively combine labeled data and unlabeled data for model training in the DFL framework? - How to ensure model consistency among various clients in the case of highly non - IID data distribution? To solve these problems, the paper proposes SemiDFL, a new semi - supervised decentralized federated learning paradigm, aiming to improve the performance of DFL in semi - supervised scenarios by establishing consensus in the model space and data space. Specific methods include: - **Neighborhood Pseudo - Labeling**: Improve the quality of pseudo - labels by combining the information of neighborhood classifiers. - **Consensus MixUp**: Use the generated synthetic data to mix with labeled and pseudo - labeled data to form a consensus data space. - **Adaptive Aggregation**: Dynamically adjust the aggregation weights according to the performance of each classifier on the generated data to optimize the model aggregation process. Through these innovative methods, SemiDFL can handle diverse semi - supervised DFL tasks under different data settings and shows excellent performance and robustness.