Abstract:Federated learning, which allows multiple client devices in a network to jointly train a machine learning model without direct exposure of clients' data, is an emerging distributed learning technique due to its nature of privacy preservation. However, it has been found that models trained with federated learning usually have worse performance than their counterparts trained in the standard centralized learning mode, especially when the training data is imbalanced. In the context of federated learning, data imbalance may occur either locally one one client device, or globally across many devices. The complexity of different types of data imbalance has posed challenges to the development of federated learning technique, especially considering the need of relieving data imbalance issue and preserving data privacy at the same time. Therefore, in the literature, many attempts have been made to handle class imbalance in federated learning. In this paper, we present a detailed review of recent advancements along this line. We first introduce various types of class imbalance in federated learning, after which we review existing methods for estimating the extent of class imbalance without the need of knowing the actual data to preserve data privacy. After that, we discuss existing methods for handling class imbalance in FL, where the advantages and disadvantages of the these approaches are discussed. We also summarize common evaluation metrics for class imbalanced tasks, and point out potential future directions.

What problem does this paper attempt to address?

The paper primarily explores the issue of class imbalance in Federated Learning (FL) and reviews related research. Specifically, the paper aims to address the following key issues: 1. **Definition and Classification**: First, it defines the basic concepts of Federated Learning and categorizes three types of Federated Learning models based on different feature spaces and sample spaces: Horizontal Federated Learning, Vertical Federated Learning, and Federated Transfer Learning. 2. **Class Imbalance Problem**: It then details several types of class imbalance problems in Federated Learning, including Local Imbalance, Global Imbalance, and Mismatch Imbalance, and discusses how these imbalances affect model performance. 3. **Estimating Class Distribution**: The paper further reviews existing methods used to estimate the degree of class imbalance, particularly addressing the need for data privacy protection in Federated Learning environments. It proposes two types of methods: distribution inference methods based on local distribution and distribution estimation methods based on model parameters. - **Methods Based on Local Distribution**: This method requires clients to upload the class distribution information of their local datasets to the central server, thereby aggregating the global class distribution. Although simple and feasible, it may leak some user privacy. - **Methods Based on Model Parameters**: These methods indirectly infer class distribution by analyzing the model parameters uploaded by clients (such as gradients, losses, or predictions), thereby better protecting user privacy. Through the above work, the paper aims to systematically review and summarize the research progress in addressing the class imbalance problem in Federated Learning and propose possible future directions. This helps researchers understand the complexity of class imbalance in Federated Learning and design more effective solutions.

A Survey on Class Imbalance in Federated Learning

Addressing Class Imbalance in Federated Learning

An Experimental Study of Class Imbalance in Federated Learning

Class-Imbalance and Client-Imbalance Federated Learning for Fault Diagnosis

A Generalized Look at Federated Learning: Survey and Perspectives

A Survey of Federated Learning on Non-IID Data

Aligning Model Outputs for Class Imbalanced Non-Iid Federated Learning

Federated Learning with Classifier Shift for Class Imbalance.

Addressing Class Variable Imbalance in Federated Semi-supervised Learning

Self-Balancing Federated Learning With Global Imbalanced Data in Mobile Systems

A Survey of What to Share in Federated Learning: Perspectives on Model Utility, Privacy Leakage, and Communication Efficiency

Recent Advances on Federated Learning: A Systematic Survey

A Survey on Federated Learning Systems: Vision, Hype and Reality for Data Privacy and Protection

A Survey of Federated Evaluation in Federated Learning

Federated learning on non-IID data: A survey

Towards Efficient Communications in Federated Learning: A Contemporary Survey

Federated deep long-tailed learning: A survey

Performance Enhancement in Federated Learning by Reducing Class Imbalance of Non-IID Data

The Impact of Differential Privacy on Model Fairness in Federated Learning.

Privacy, accuracy, and model fairness trade-offs in federated learning