FedClust: Optimizing Federated Learning on Non-IID Data through Weight-Driven Client Clustering

Md Sirajul Islam,Simin Javaherian,Fei Xu,Xu Yuan,Li Chen,Nian-Feng Tzeng

2024-03-07

Abstract:Federated learning (FL) is an emerging distributed machine learning paradigm enabling collaborative model training on decentralized devices without exposing their local data. A key challenge in FL is the uneven data distribution across client devices, violating the well-known assumption of independent-and-identically-distributed (IID) training samples in conventional machine learning. Clustered federated learning (CFL) addresses this challenge by grouping clients based on the similarity of their data distributions. However, existing CFL approaches require a large number of communication rounds for stable cluster formation and rely on a predefined number of clusters, thus limiting their flexibility and adaptability. This paper proposes FedClust, a novel CFL approach leveraging correlations between local model weights and client data distributions. FedClust groups clients into clusters in a one-shot manner using strategically selected partial model weights and dynamically accommodates newcomers in real-time. Experimental results demonstrate FedClust outperforms baseline approaches in terms of accuracy and communication costs.

Distributed, Parallel, and Cluster Computing,Machine Learning

What problem does this paper attempt to address?

The paper attempts to address the issue of poor model training performance in Federated Learning (FL) due to non-independent and identically distributed (Non-IID) client data. Specifically: 1. **Non-Independent and Identically Distributed (Non-IID) Data**: In traditional machine learning, it is usually assumed that training samples are independent and identically distributed (IID). However, in federated learning, the data distribution across different clients can be very different, which violates the IID assumption and leads to a decline in model performance. 2. **Limitations of Existing Methods**: - **Many Communication Rounds**: Existing Clustered Federated Learning (CFL) methods require a large number of communication rounds to form stable clusters. - **Predefined Number of Clusters**: These methods usually need to predefine the number of clusters, which limits their flexibility and adaptability. - **Using All Model Weights**: Existing CFL methods typically need to use all model weights for clustering, which increases computational and communication costs. To address these issues, the paper proposes a new CFL method—FedClust. FedClust leverages the correlation between local model weights and client data distribution to group clients in a single communication round and can dynamically accommodate newly joined clients in real-time. Experimental results show that FedClust outperforms baseline methods in terms of accuracy and communication cost.

FedClust: Optimizing Federated Learning on Non-IID Data through Weight-Driven Client Clustering

FedClust: Tackling Data Heterogeneity in Federated Learning through Weight-Driven Client Clustering

Stochastic Clustered Federated Learning

Adaptive Client Clustering for Efficient Federated Learning over Non-IID and Imbalanced Data

Flexible Clustered Federated Learning for Client-Level Data Distribution Shift

Dynamic Clustering Federated Learning for Non-IID Data.

An Efficient Client Clustering Algorithm for Clustered Federated Learning

LayerCFL: an Efficient Federated Learning with Layer-Wised Clustering

Communication-efficient clustered federated learning via model distance

Fuzzy Clustered Federated Learning under Mixed Data Distributions.

FedGroup: Accurate Federated Learning via Decomposed Similarity-Based Clustering

FedGroup: Efficient Clustered Federated Learning via Decomposed Data-Driven Measure

FedCE: Personalized Federated Learning Method based on Clustering Ensembles

ASCFL: Accurate and Speedy Semi-Supervised Clustering Federated Learning

Hierarchical Federated Learning with Adaptive Clustering on Non-IID Data

Optimizing Federated Learning on Non-IID Data Using Local Shapley Value.

Client Selection for Federated Learning With Non-IID Data in Mobile Edge Computing

Energy-efficient Clustering to Address Data Heterogeneity in Federated Learning

FedCML: Federated Clustering Mutual Learning with non-IID Data.

FedAC: An Adaptive Clustered Federated Learning Framework for Heterogeneous Data

Clustered Federated Learning in Heterogeneous Environment.