Abstract:Federated learning (FL) is an efficient decentralized machine learning methodology for processing nonindependent and identically distributed (non-IID) data due to geographical and temporal distribution differences. Non-IID data generally indicates substantial disparities in data distribution and features among clients. This assumption is completely different from the conventional assumption of independent and identically distributed (IID) data in which all clients' data originates from the same distribution. There are many factors that affect the features of non-IID data, such as user preferences, data collection methods, and client characteristics. The factors of data distribution, category proportions, and feature representation also affect the statistical properties of non-IID data. This article conducts an in-depth exploration of FL with the consideration of diverse features and statistical properties of non-IID data. Specifically, we first discuss the impact of non-IID data on communication efficiency, model convergence, and FL accuracy. The presence of non-IID data leads to increased communication overhead, imbalanced class distribution, and uneven local model updates. All of these affect FL convergence and performance. Then, we present the latest advanced techniques, such as data partitioning/sharing, client selection, differential privacy, and secure aggregation, which are used to address the challenges posed by non-IID data in terms of communication efficiency and privacy protection. Furthermore, we show the emerging applications and use cases of FL with non-IID data in various domains, such as healthcare, Internet of Things, and edge computing. Overall, this survey provides a comprehensive understanding of FL with non-IID data, including the challenges, advancements, and practical applications in different areas.

Federated Loss Exploration for Improved Convergence on Non-IID Data

FedDGP: Disentangling Global and Personal Models for Federated Learning

Federated Learning for Non-IID Data Via Unified Feature Learning and Optimization Objective Alignment

Optimizing Federated Learning on Non-IID Data Using Local Shapley Value.

FedEL: Federated ensemble learning for non-iid data

A Survey of Federated Learning on Non-IID Data

Federated learning on non-IID and long-tailed data via dual-decoupling

Tackling Data Heterogeneity in Federated Learning via Loss Decomposition

Enhancing Generalization Robustness of Federated Learning in Highly Heterogeneous Environments

Joint Local Relational Augmentation and Global Nash Equilibrium for Federated Learning with Non-IID Data

Adaptive Federated Learning on Non-IID Data with Resource Constraint

Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning

On the Convergence of Clustered Federated Learning

Federated Learning with Non-IID Data: A Survey

Fed-FSNet: Mitigating Non-I.I.D. Federated Learning via Fuzzy Synthesizing Network

Federated Learning via Consensus Mechanism on Heterogeneous Data: A New Perspective on Convergence

Fine-tuning Global Model Via Data-Free Knowledge Distillation for Non-IID Federated Learning

Completely Heterogeneous Federated Learning

FedEdge: Accelerating Edge-Assisted Federated Learning

FedH2L: A Federated Learning Approach with Model and Statistical Heterogeneity

Non-IID Federated Learning with Sharper Risk Bound.