Abstract:Federated learning is a decentralized learning paradigm wherein a central server trains a global model iteratively by utilizing clients who possess a certain amount of private datasets. The challenge lies in the fact that the client side private data may not be identically and independently distributed, significantly impacting the accuracy of the global model. Existing methods commonly address the Non-IID challenge by focusing on optimization, client selection and data complement. However, most approaches tend to overlook the perspective of the private data itself due to privacy <a class="link-external link-http" href="http://constraints.Intuitively" rel="external noopener nofollow">this http URL</a>, statistical distinctions among private data on the client side can help mitigate the Non-IID degree. Besides, the recent advancements in dataset condensation technology have inspired us to investigate its potential applicability in addressing Non-IID issues while maintaining privacy. Motivated by this, we propose DCFL which divides clients into groups by using the Centered Kernel Alignment (CKA) method, then uses dataset condensation methods with non-IID awareness to complete clients. The private data from clients within the same group is complementary and their condensed data is accessible to all clients in the group. Additionally, CKA-guided client selection strategy, filtering mechanisms, and data enhancement techniques are incorporated to efficiently and precisely utilize the condensed data, enhance model performance, and minimize communication time. Experimental results demonstrate that DCFL achieves competitive performance on popular federated learning benchmarks including MNIST, FashionMNIST, SVHN, and CIFAR-10 with existing FL protocol.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the non - independent and identically distributed (Non - IID) data problem in federated learning (Federated Learning, FL). Specifically, the challenge in federated learning is that the private data of clients may not be independent and identically distributed, which significantly affects the accuracy of the global model. Existing methods usually deal with the Non - IID challenge through optimization, client selection, and data supplementation, but these methods often overlook the perspective of the private data itself due to privacy limitations. To explain this problem in more detail, we can use the following formula to represent the impact of Non - IID data: \[ \text{weight divergence} = \frac{\|w_{\text{FedAvg}} - w_{\text{SGD}}\|}{\|w_{\text{SGD}}\|} \] where \( w_{\text{FedAvg}} \) is the weight trained by the federated averaging algorithm (FedAvg), and \( w_{\text{SGD}} \) is the weight trained using the global data set (assuming the server knows all data distributions). Research shows that Non - IID data can lead to an increase in model weight differences, thereby affecting model performance. Furthermore, the paper points out that although existing methods perform well in some Non - IID scenarios, they cannot consistently outperform other algorithms and cannot change the inherent Non - IID characteristics of client data. Therefore, the authors propose a new framework - DCFL (Data Condensation aided Federated Learning with Non - IID awareness), aiming to mitigate the negative impacts of Non - IID data on federated learning model training, communication, and performance by efficiently using condensed data. ### Main contributions 1. **Client complementarity based on CKA**: Introduce the Centered Kernel Alignment (CKA) method to measure the complementarity between clients, guiding client selection and condensed data transmission. The server - side calculates the complementarity between each client and other clients, and then groups the clients according to the complementarity, thereby achieving more fine - grained client selection, reducing the overall communication cost and improving the final model performance. 2. **Condensed data - assisted client model training with Non - IID awareness**: When the client model is trained, the real data cooperates with the condensed data from other clients in the same complementary group. In addition, the DSA (Differentiable Siamese Augmentation) data augmentation technique is also used, and the weight calculation formula of participating clients is re - organized according to the change in the number of local data sets of clients, to further reduce the number of communication rounds, make the training process more stable, and ultimately improve the model performance. 3. **Experimental verification**: Use four public data sets, MNIST, Fashion MNIST, SVHN, and CIFAR - 10, to verify the effectiveness of the DCFL algorithm. The experimental results show that DCFL outperforms traditional federated learning methods in terms of test accuracy and communication cost in different scenarios. In summary, the main goal of this paper is to effectively deal with the Non - IID data problem in federated learning, improve model performance, and reduce communication overhead by introducing novel data condensation techniques and client selection strategies.

DCFL: Non-IID awareness Data Condensation aided Federated Learning

Dynamic Clustering Federated Learning for Non-IID Data.

Optimizing Federated Learning on Non-IID Data Using Local Shapley Value.

FedDC: Federated Learning with Non-IID Data via Local Drift Decoupling and Correction

Client Selection for Federated Learning With Non-IID Data in Mobile Edge Computing

MultiConfederated Learning: Inclusive Non-IID Data handling with Decentralized Federated Learning

Communication-efficient federated continual learning for distributed learning system with Non-IID data

A Survey of Federated Learning on Non-IID Data

FedClust: Optimizing Federated Learning on Non-IID Data through Weight-Driven Client Clustering

A Clustered Federated Learning Method of User Behavior Analysis Based on Non-IID Data

On the Convergence of Clustered Federated Learning

Federated Learning for Non-IID Data Via Unified Feature Learning and Optimization Objective Alignment

Dataset Distillation-based Hybrid Federated Learning on Non-IID Data

Federated learning on non-IID and long-tailed data via dual-decoupling

FedDCL: a federated data collaboration learning as a hybrid-type privacy-preserving framework based on federated learning and data collaboration

DSFedCon: Dynamic Sparse Federated Contrastive Learning for Data-Driven Intelligent Systems

DPP-based Client Selection for Federated Learning with Non-IID Data

Dual Calibration-based Personalised Federated Learning

Federated Learning with Soft Clustering

Federated Learning with Non-IID Data: A Survey

FLIS: Clustered Federated Learning via Inference Similarity for Non-IID Data Distribution