FedDUAL: A Dual-Strategy with Adaptive Loss and Dynamic Aggregation for Mitigating Data Heterogeneity in Federated Learning

Pranab Sahoo,Ashutosh Tripathi,Sriparna Saha,Samrat Mondal
2024-12-06
Abstract:Federated Learning (FL) marks a transformative approach to distributed model training by combining locally optimized models from various clients into a unified global model. While FL preserves data privacy by eliminating centralized storage, it encounters significant challenges such as performance degradation, slower convergence, and reduced robustness of the global model due to the heterogeneity in client data distributions. Among the various forms of data heterogeneity, label skew emerges as a particularly formidable and prevalent issue, especially in domains such as image classification. To address these challenges, we begin with comprehensive experiments to pinpoint the underlying issues in the FL training process. Based on our findings, we then introduce an innovative dual-strategy approach designed to effectively resolve these issues. First, we introduce an adaptive loss function for client-side training, meticulously crafted to preserve previously acquired knowledge while maintaining an optimal equilibrium between local optimization and global model coherence. Secondly, we develop a dynamic aggregation strategy for aggregating client models at the server. This approach adapts to each client's unique learning patterns, effectively addressing the challenges of diverse data across the network. Our comprehensive evaluation, conducted across three diverse real-world datasets, coupled with theoretical convergence guarantees, demonstrates the superior efficacy of our method compared to several established state-of-the-art approaches.
Machine Learning,Artificial Intelligence,Computer Vision and Pattern Recognition,Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the data heterogeneity problem in Federated Learning (FL), especially issues such as performance degradation, slow convergence speed, and reduced global model robustness caused by label skew. Specifically: 1. **Challenges of data heterogeneity**: In federated learning, due to the large differences in data distribution among different clients, especially in tasks such as image classification, the label skew phenomenon is particularly serious. This data heterogeneity can lead to "sharp minima" during the training process, thus affecting the generalization ability and stability of the model. 2. **Limitations of existing methods**: Traditional static aggregation methods (such as FedAvg) are not effective in handling non - independent and identically distributed (non - IID) data and are difficult to adapt to dynamically changing data distributions and client drift. These methods are usually unable to effectively deal with complex non - IID data distributions, resulting in a decline in model performance. To solve the above problems, the paper proposes a dual - strategy method named FedDUAL, which specifically includes: - **Adaptive loss function**: An adaptive loss function is introduced during client training. By adjusting the trade - off between the local and global models, it ensures local optimization while maintaining the consistency of the global model. This loss function combines cross - entropy loss and Kullback - Leibler (KL) divergence to quantify the probability distribution differences between local and global model weights. - **Dynamic aggregation strategy**: A dynamic aggregation method based on Wasserstein Barycenter is adopted on the server side to optimize the gradients of the final layer. This method can better integrate the learning behaviors of different clients, reduce the negative impacts brought by non - IID data, and thus improve the stability and generalization ability of the model. Through these two strategies, FedDUAL can develop more robust and general - purpose federated models in highly heterogeneous data environments, significantly improving the performance and convergence speed of the model. Experimental results show that this method outperforms the existing state - of - the - art methods on multiple real - world datasets.