Addressing Skewed Heterogeneity via Federated Prototype Rectification with Personalization

Shunxin Guo,Hongsong Wang,Shuxia Lin,Zhiqiang Kou,Xin Geng
2024-08-23
Abstract:Federated learning is an efficient framework designed to facilitate collaborative model training across multiple distributed devices while preserving user data privacy. A significant challenge of federated learning is data-level heterogeneity, i.e., skewed or long-tailed distribution of private data. Although various methods have been proposed to address this challenge, most of them assume that the underlying global data is uniformly distributed across all clients. This paper investigates data-level heterogeneity federated learning with a brief review and redefines a more practical and challenging setting called Skewed Heterogeneous Federated Learning (SHFL). Accordingly, we propose a novel Federated Prototype Rectification with Personalization which consists of two parts: Federated Personalization and Federated Prototype Rectification. The former aims to construct balanced decision boundaries between dominant and minority classes based on private data, while the latter exploits both inter-class discrimination and intra-class consistency to rectify empirical prototypes. Experiments on three popular benchmarks show that the proposed approach outperforms current state-of-the-art methods and achieves balanced performance in both personalization and generalization.
Machine Learning,Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
### The Problem the Paper Attempts to Solve This paper attempts to address the issue of data heterogeneity in Federated Learning (FL), particularly the challenges caused by skewed distribution. Specifically, the paper focuses on **Skewed Heterogeneous Federated Learning (SHFL)**, which deals with significant differences in data category distribution and sample quantity among different clients, and how to effectively train a federated model under these conditions. ### Background and Challenges 1. **Data Heterogeneity**: In federated learning, the data distribution across different clients is usually non-independent and identically distributed (non-iid), meaning there can be significant differences in category distribution and decision boundaries among clients. This heterogeneity can lead to significant differences in local model parameters, thereby affecting the performance of the global model. 2. **Skewed Distribution**: In practical applications, training samples often exhibit skewed category distribution, where the number of samples in a few categories far exceeds that in others. This phenomenon is particularly evident in fields like medical diagnosis, where the distribution of different disease samples can be very uneven across different medical institutions. ### Limitations of Existing Methods Although various methods have been proposed to address data heterogeneity, most assume that the overall data is uniformly distributed, which is not the case in the real world. Existing methods perform poorly when dealing with categories with a small number of samples because they do not fully consider the impact of long-tail distribution. ### Main Contributions of the Paper 1. **Redefining the Problem**: The paper systematically reviews the issue of data heterogeneity in federated learning and further defines Skewed Heterogeneous Federated Learning (SHFL), exploring how to achieve both general and personalized performance of the model. 2. **Proposing a New Framework**: The paper proposes a new framework called Federated Prototype Rectification with Personalization (FedPRP), which includes two modules: - **Federated Personalization**: Designing personalized classifiers for each heterogeneous client with different skewed category distributions to ensure the personalized performance of local models. - **Federated Prototype Rectification**: Optimizing representation learning by introducing inter-class discrimination loss and intra-class consistency loss to ensure consistency in local training across different clients, making the aggregated global model more robust and generalizable. 3. **Experimental Validation**: The paper conducts experiments on three popular benchmark datasets, showing that the proposed FedPRP method outperforms current state-of-the-art methods, achieving a balance between personalization and generalization performance. ### Summary This paper addresses the issues of data heterogeneity and skewed distribution in federated learning by proposing a new framework, FedPRP, aimed at improving model performance through personalization and prototype rectification. This method is of significant importance in practical applications, particularly in scenarios with uneven data distribution, as it effectively enhances the performance and robustness of federated learning.