FedCCL: Federated Dual-Clustered Feature Contrast Under Domain Heterogeneity

Yu Qiao,Huy Q. Le,Mengchun Zhang,Apurba Adhikary,Chaoning Zhang,Choong Seon Hong
2024-09-11
Abstract:Federated learning (FL) facilitates a privacy-preserving neural network training paradigm through collaboration between edge clients and a central server. One significant challenge is that the distributed data is not independently and identically distributed (non-IID), typically including both intra-domain and inter-domain heterogeneity. However, recent research is limited to simply using averaged signals as a form of regularization and only focusing on one aspect of these non-IID challenges. Given these limitations, this paper clarifies these two non-IID challenges and attempts to introduce cluster representation to address them from both local and global perspectives. Specifically, we propose a dual-clustered feature contrast-based FL framework with dual focuses. First, we employ clustering on the local representations of each client, aiming to capture intra-class information based on these local clusters at a high level of granularity. Then, we facilitate cross-client knowledge sharing by pulling the local representation closer to clusters shared by clients with similar semantics while pushing them away from clusters with dissimilar semantics. Second, since the sizes of local clusters belonging to the same class may differ for each client, we further utilize clustering on the global side and conduct averaging to create a consistent global signal for guiding each local training in a contrastive manner. Experimental results on multiple datasets demonstrate that our proposal achieves comparable or superior performance gain under intra-domain and inter-domain heterogeneity.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
This paper attempts to address the issue of handling non-independent and identically distributed (non-IID) data in Federated Learning (FL), particularly inter-domain heterogeneity and imbalanced intra-domain heterogeneity. Specifically, the paper points out that current federated learning methods mainly focus on using average signals as a regularization means, but this approach can only solve part of the non-IID challenges, especially when dealing with imbalanced intra-domain heterogeneity and inter-domain heterogeneity. Therefore, the paper proposes a new Federated Dual-Clustered Feature Contrast (FedCCL) framework, aiming to address these challenges from both local and global perspectives. ### Main Contributions of the Paper: 1. **Overview of non-IID Challenges**: The paper describes the non-IID challenges in detail from both intra-domain and inter-domain perspectives and proposes a new federated learning framework that includes a dual-clustered feature contrast strategy to train local models with good generalization capabilities. To the best of the authors' knowledge, this is the first time a dual-clustered feature contrast strategy has been introduced in federated learning. 2. **Proposed Dual-Clustered Feature Contrast Framework**: This framework includes two new components: - **Local Clustered Feature Contrast**: Promotes cross-client contrastive learning through local clustered signals, enhancing the model's ability to capture subtle differences within each category and promoting knowledge sharing between clients. - **Global Clustered Feature Contrast**: Utilizes unbiased global signals to guide each local training process, further improving the generalization ability of local models. 3. **Experimental Validation**: The effectiveness of the proposed method was evaluated on multiple datasets, and the results show that the method has comparable or superior performance in handling intra-domain and inter-domain heterogeneity challenges. Additionally, privacy protection evaluations under adversarial attacks were conducted, showing that the method is more robust than existing robust federated learning baseline methods. ### Specific Problems Addressed: - **Imbalanced Intra-Domain Heterogeneity**: Data comes from the same domain but with different label distributions and quantities. For example, the number of samples for some categories may be much higher than others, leading to model optimization bias towards dominant labels. - **Inter-Domain Heterogeneity**: Data comes from different domains, and the sample quantities and label distributions may differ. For example, data from different domains may have different feature distributions, leading to model optimization bias towards dominant domains. By introducing the dual-clustered feature contrast strategy, the paper aims to improve the performance and generalization ability of federated learning models in handling these complex data distribution issues.