A Survey on Class Imbalance in Federated Learning

Jing Zhang,Chuanwen Li,Jianzgong Qi,Jiayuan He
2023-03-21
Abstract:Federated learning, which allows multiple client devices in a network to jointly train a machine learning model without direct exposure of clients' data, is an emerging distributed learning technique due to its nature of privacy preservation. However, it has been found that models trained with federated learning usually have worse performance than their counterparts trained in the standard centralized learning mode, especially when the training data is imbalanced. In the context of federated learning, data imbalance may occur either locally one one client device, or globally across many devices. The complexity of different types of data imbalance has posed challenges to the development of federated learning technique, especially considering the need of relieving data imbalance issue and preserving data privacy at the same time. Therefore, in the literature, many attempts have been made to handle class imbalance in federated learning. In this paper, we present a detailed review of recent advancements along this line. We first introduce various types of class imbalance in federated learning, after which we review existing methods for estimating the extent of class imbalance without the need of knowing the actual data to preserve data privacy. After that, we discuss existing methods for handling class imbalance in FL, where the advantages and disadvantages of the these approaches are discussed. We also summarize common evaluation metrics for class imbalanced tasks, and point out potential future directions.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The paper primarily explores the issue of class imbalance in Federated Learning (FL) and reviews related research. Specifically, the paper aims to address the following key issues: 1. **Definition and Classification**: First, it defines the basic concepts of Federated Learning and categorizes three types of Federated Learning models based on different feature spaces and sample spaces: Horizontal Federated Learning, Vertical Federated Learning, and Federated Transfer Learning. 2. **Class Imbalance Problem**: It then details several types of class imbalance problems in Federated Learning, including Local Imbalance, Global Imbalance, and Mismatch Imbalance, and discusses how these imbalances affect model performance. 3. **Estimating Class Distribution**: The paper further reviews existing methods used to estimate the degree of class imbalance, particularly addressing the need for data privacy protection in Federated Learning environments. It proposes two types of methods: distribution inference methods based on local distribution and distribution estimation methods based on model parameters. - **Methods Based on Local Distribution**: This method requires clients to upload the class distribution information of their local datasets to the central server, thereby aggregating the global class distribution. Although simple and feasible, it may leak some user privacy. - **Methods Based on Model Parameters**: These methods indirectly infer class distribution by analyzing the model parameters uploaded by clients (such as gradients, losses, or predictions), thereby better protecting user privacy. Through the above work, the paper aims to systematically review and summarize the research progress in addressing the class imbalance problem in Federated Learning and propose possible future directions. This helps researchers understand the complexity of class imbalance in Federated Learning and design more effective solutions.