Abstract:Federated learning (FL) facilitates a privacy-preserving neural network training paradigm through collaboration between edge clients and a central server. One significant challenge is that the distributed data is not independently and identically distributed (non-IID), typically including both intra-domain and inter-domain heterogeneity. However, recent research is limited to simply using averaged signals as a form of regularization and only focusing on one aspect of these non-IID challenges. Given these limitations, this paper clarifies these two non-IID challenges and attempts to introduce cluster representation to address them from both local and global perspectives. Specifically, we propose a dual-clustered feature contrast-based FL framework with dual focuses. First, we employ clustering on the local representations of each client, aiming to capture intra-class information based on these local clusters at a high level of granularity. Then, we facilitate cross-client knowledge sharing by pulling the local representation closer to clusters shared by clients with similar semantics while pushing them away from clusters with dissimilar semantics. Second, since the sizes of local clusters belonging to the same class may differ for each client, we further utilize clustering on the global side and conduct averaging to create a consistent global signal for guiding each local training in a contrastive manner. Experimental results on multiple datasets demonstrate that our proposal achieves comparable or superior performance gain under intra-domain and inter-domain heterogeneity.

What problem does this paper attempt to address?

This paper attempts to address the issue of handling non-independent and identically distributed (non-IID) data in Federated Learning (FL), particularly inter-domain heterogeneity and imbalanced intra-domain heterogeneity. Specifically, the paper points out that current federated learning methods mainly focus on using average signals as a regularization means, but this approach can only solve part of the non-IID challenges, especially when dealing with imbalanced intra-domain heterogeneity and inter-domain heterogeneity. Therefore, the paper proposes a new Federated Dual-Clustered Feature Contrast (FedCCL) framework, aiming to address these challenges from both local and global perspectives. ### Main Contributions of the Paper: 1. **Overview of non-IID Challenges**: The paper describes the non-IID challenges in detail from both intra-domain and inter-domain perspectives and proposes a new federated learning framework that includes a dual-clustered feature contrast strategy to train local models with good generalization capabilities. To the best of the authors' knowledge, this is the first time a dual-clustered feature contrast strategy has been introduced in federated learning. 2. **Proposed Dual-Clustered Feature Contrast Framework**: This framework includes two new components: - **Local Clustered Feature Contrast**: Promotes cross-client contrastive learning through local clustered signals, enhancing the model's ability to capture subtle differences within each category and promoting knowledge sharing between clients. - **Global Clustered Feature Contrast**: Utilizes unbiased global signals to guide each local training process, further improving the generalization ability of local models. 3. **Experimental Validation**: The effectiveness of the proposed method was evaluated on multiple datasets, and the results show that the method has comparable or superior performance in handling intra-domain and inter-domain heterogeneity challenges. Additionally, privacy protection evaluations under adversarial attacks were conducted, showing that the method is more robust than existing robust federated learning baseline methods. ### Specific Problems Addressed: - **Imbalanced Intra-Domain Heterogeneity**: Data comes from the same domain but with different label distributions and quantities. For example, the number of samples for some categories may be much higher than others, leading to model optimization bias towards dominant labels. - **Inter-Domain Heterogeneity**: Data comes from different domains, and the sample quantities and label distributions may differ. For example, data from different domains may have different feature distributions, leading to model optimization bias towards dominant domains. By introducing the dual-clustered feature contrast strategy, the paper aims to improve the performance and generalization ability of federated learning models in handling these complex data distribution issues.

FedCCL: Federated Dual-Clustered Feature Contrast Under Domain Heterogeneity

Contrastive encoder pre-training-based clustered federated learning for heterogeneous data

FedCRL: Personalized Federated Learning with Contrastive Shared Representations for Label Heterogeneity in Non-IID Data

FedClust: Tackling Data Heterogeneity in Federated Learning through Weight-Driven Client Clustering

Dual-Segment Clustering Strategy for Hierarchical Federated Learning in Heterogeneous Wireless Environments

Federated Momentum Contrastive Clustering

FedNorm: an Efficient Federated Learning Framework with Dual Heterogeneity Coexistence on Edge Intelligence Systems.

Distributed Unsupervised Visual Representation Learning with Fused Features

FedDA: Resource-adaptive Federated Learning with Dual-Alignment Aggregation Optimization for Heterogeneous Edge Devices

Flexible Clustered Federated Learning for Client-Level Data Distribution Shift

Completely Heterogeneous Federated Learning

FedClust: Optimizing Federated Learning on Non-IID Data through Weight-Driven Client Clustering

CCFC++: Enhancing Federated Clustering through Feature Decorrelation

CCFC: Bridging Federated Clustering and Contrastive Learning

FedDUAL: A Dual-Strategy with Adaptive Loss and Dynamic Aggregation for Mitigating Data Heterogeneity in Federated Learning

FedCCRL: Federated Domain Generalization with Cross-Client Representation Learning

Federated learning with incremental clustering for heterogeneous data

Unsupervised Federated Optimization at the Edge: D2D-Enabled Learning without Labels

Generalizable Heterogeneous Federated Cross-Correlation and Instance Similarity Learning

Enhancing Edge-Assisted Federated Learning with Asynchronous Aggregation and Cluster Pairing

CLIP-Guided Federated Learning on Heterogeneity and Long-Tailed Data