Abstract:Federated learning (FL) is an efficient decentralized machine learning methodology for processing nonindependent and identically distributed (non-IID) data due to geographical and temporal distribution differences. Non-IID data generally indicates substantial disparities in data distribution and features among clients. This assumption is completely different from the conventional assumption of independent and identically distributed (IID) data in which all clients' data originates from the same distribution. There are many factors that affect the features of non-IID data, such as user preferences, data collection methods, and client characteristics. The factors of data distribution, category proportions, and feature representation also affect the statistical properties of non-IID data. This article conducts an in-depth exploration of FL with the consideration of diverse features and statistical properties of non-IID data. Specifically, we first discuss the impact of non-IID data on communication efficiency, model convergence, and FL accuracy. The presence of non-IID data leads to increased communication overhead, imbalanced class distribution, and uneven local model updates. All of these affect FL convergence and performance. Then, we present the latest advanced techniques, such as data partitioning/sharing, client selection, differential privacy, and secure aggregation, which are used to address the challenges posed by non-IID data in terms of communication efficiency and privacy protection. Furthermore, we show the emerging applications and use cases of FL with non-IID data in various domains, such as healthcare, Internet of Things, and edge computing. Overall, this survey provides a comprehensive understanding of FL with non-IID data, including the challenges, advancements, and practical applications in different areas.

Exploring the Impact of Non-IID on Federated Learning

Optimizing Federated Learning on Non-IID Data Using Local Shapley Value.

Federated Learning on Non-IID Data Silos: An Experimental Study

Federated Learning on Non-Independent and Identically Distributed Data

Federated learning on non-IID data: A survey

A Survey of Federated Learning on Non-IID Data

Communication-efficient federated continual learning for distributed learning system with Non-IID data

Federated Learning with Non-IID Data: A Survey

Dataset Distillation-based Hybrid Federated Learning on Non-IID Data

Handling Non-IID Data in Federated Learning: An Experimental Evaluation Towards Unified Metrics

A Clustered Federated Learning Method of User Behavior Analysis Based on Non-IID Data

Neural Collapse Inspired Federated Learning with Non-iid Data

The Effect of Training Parameters and Mechanisms on Decentralized Federated Learning based on MNIST Dataset

An Optimization Method for Non-IID Federated Learning Based on Deep Reinforcement Learning

FedBN: Federated Learning on Non-IID Features via Local Batch Normalization

Joint Local Relational Augmentation and Global Nash Equilibrium for Federated Learning with Non-IID Data

Cross-Domain Federated Data Modeling on Non-IID Data

FedDC: Federated Learning with Non-IID Data via Local Drift Decoupling and Correction

Client Selection for Federated Learning With Non-IID Data in Mobile Edge Computing

Exploring Server-Side Data in Federated Learning: an Empirical Study

FedAA: Using Non-sensitive Modalities to Improve Federated Learning while Preserving Image Privacy