Abstract:Federated Learning (FL) is a machine-learning approach enabling collaborative model training across multiple decentralized edge devices that hold local data samples, all without exchanging these samples. This collaborative process occurs under the supervision of a central server orchestrating the training or via a peer-to-peer network. The significance of FL is particularly pronounced in industries such as healthcare and finance, where data privacy holds paramount importance. However, training a model under the Federated learning setting brings forth several challenges, with one of the most prominent being the heterogeneity of data distribution among the edge devices. The data is typically non-independently and non-identically distributed (non-IID), thereby presenting challenges to model convergence. This report delves into the issues arising from non-IID and heterogeneous data and explores current algorithms designed to address these challenges.

What problem does this paper attempt to address?

This paper primarily explores the challenges posed by non-independent and identically distributed (non-IID) and heterogeneous data in Federated Learning (FL) and proposes several technical methods to address these challenges. ### Main Issues The core issues the paper attempts to address are: - **Heterogeneous Data**: In federated learning, data from different devices usually have different distributions, making it difficult to train a global model. - **Non-IID Data**: Data samples are not independently and identically distributed, meaning the data distribution on different devices may be inconsistent, leading to difficulties in model convergence or even divergence. ### Specific Challenges - **Model Heterogeneity**: Different clients have different data distributions, making it difficult for the trained global model to perform well on all clients. - **Convergence Challenges**: Heterogeneous and non-IID data may slow down the model convergence speed or even prevent convergence. - **Sampling Bias**: Non-IID data may cause the model to be biased towards specific subgroups, requiring solutions to sampling bias to ensure fairness and generalization ability. - **Adaptability Issues**: Client data changes over time, and ensuring the global model can quickly adapt to local changes without affecting overall performance is a challenge. - **Robustness**: Building models that can generalize across different data sources is a key challenge in federated learning. ### Solutions The paper introduces several methods to address the above challenges: 1. **FedDF (Federated Distillation Fusion)**: Uses knowledge distillation techniques to fuse the knowledge of multiple client models into a global model, improving model accuracy and convergence speed. 2. **FedLbl (Label-based Aggregation Method)**: Aggregates local models based on the number of categories in client data to better handle heterogeneous data. 3. **Def-KT (Decentralized Mutual Learning Algorithm)**: In a decentralized federated learning setup, trains models through mutual knowledge transfer, enhancing model generalization ability and learning capability for unseen data samples. ### Conclusion The paper summarizes the importance of handling heterogeneous and non-IID data in federated learning and proposes some effective solutions. Future research directions include further optimizing model aggregation techniques, dynamic adaptation methods, and handling sparse and imbalanced data.

A review on different techniques used to combat the non-IID and heterogeneous nature of data in FL

Federated Learning with Non-IID Data: A Survey

A Survey of Federated Learning on Non-IID Data

A Generalized Look at Federated Learning: Survey and Perspectives

Optimizing Federated Learning on Non-IID Data Using Local Shapley Value.

Non-IID data in Federated Learning: A Survey with Taxonomy, Metrics, Methods, Frameworks and Future Directions

Federated Learning on Non-IID Data Silos: An Experimental Study

"Federated Learning: Advancements, Applications, and Future Directions for Collaborative Machine Learning in Distributed Environments"

Client Selection for Federated Learning With Non-IID Data in Mobile Edge Computing

MultiConfederated Learning: Inclusive Non-IID Data handling with Decentralized Federated Learning

Issues in federated learning: some experiments and preliminary results

Comparative assessment of federated and centralized machine learning

A Survey on Decentralized Federated Learning

Federated Loss Exploration for Improved Convergence on Non-IID Data

Fed-FSNet: Mitigating Non-I.I.D. Federated Learning via Fuzzy Synthesizing Network

Data Collaborative Federated Learning for Non-i.i.d Data in Wireless Networks

Federated Learning for Non-IID Data Via Unified Feature Learning and Optimization Objective Alignment

Enhancing Federated Learning Convergence with Dynamic Data Queue and Data Entropy-driven Participant Selection

Federated Edge Learning: Design Issues and Challenges

Enhancing generalization in federated learning with heterogeneous data: A comparative literature review