Feature Matching Data Synthesis for Non-IID Federated Learning

Zijian Li,Yuchang Sun,Jiawei Shao,Yuyi Mao,Jessie Hui Wang,Jun Zhang
2023-08-09
Abstract:Federated learning (FL) has emerged as a privacy-preserving paradigm that trains neural networks on edge devices without collecting data at a central server. However, FL encounters an inherent challenge in dealing with non-independent and identically distributed (non-IID) data among devices. To address this challenge, this paper proposes a hard feature matching data synthesis (HFMDS) method to share auxiliary data besides local models. Specifically, synthetic data are generated by learning the essential class-relevant features of real samples and discarding the redundant features, which helps to effectively tackle the non-IID issue. For better privacy preservation, we propose a hard feature augmentation method to transfer real features towards the decision boundary, with which the synthetic data not only improve the model generalization but also erase the information of real features. By integrating the proposed HFMDS method with FL, we present a novel FL framework with data augmentation to relieve data heterogeneity. The theoretical analysis highlights the effectiveness of our proposed data synthesis method in solving the non-IID challenge. Simulation results further demonstrate that our proposed HFMDS-FL algorithm outperforms the baselines in terms of accuracy, privacy preservation, and computational cost on various benchmark datasets.
Machine Learning,Artificial Intelligence,Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The paper attempts to address the issue of handling non-independent and identically distributed (non-IID) data in Federated Learning (FL). Specifically, Federated Learning is a privacy-preserving method for training neural networks on edge devices without collecting data to a central server. However, an inherent problem faced by Federated Learning is how to handle non-IID data across different devices. This data heterogeneity can lead to biased local updates, thereby affecting the performance and accuracy of the global model. To tackle this challenge, the paper proposes a Hard Feature Matching Data Synthesis (HFMDS) method, which generates auxiliary data to supplement local models and mitigate the non-IID problem. This method generates synthetic data by learning key class-related features of real samples and discarding redundant features, effectively addressing the non-IID issue. Additionally, to better protect privacy, the paper introduces a hard feature enhancement method that shifts real features towards the decision boundary, making the synthetic data not only improve the model's generalization ability but also erase information of real features. By combining the HFMDS method with Federated Learning, the paper proposes a new Federated Learning framework that alleviates data heterogeneity through data augmentation. The main contributions of the paper include: 1. Proposing a Feature Matching Data Synthesis (FMDS) method based on class-related feature matching, which reduces the training overhead of data synthesis and generates effective synthetic data. 2. Further proposing the Hard Feature Matching Data Synthesis (HFMDS) method, which enhances privacy protection and improves the effectiveness of synthetic data by shifting real features towards the decision boundary. 3. Combining the HFMDS method with Federated Learning to propose a new Federated Learning algorithm (HFMDS-FL), where each client can generate hard feature matching synthetic data and share it with other clients, thereby alleviating data heterogeneity among clients. 4. Demonstrating the effectiveness of HFMDS-FL in feature alignment and domain adaptation through visualization results and theoretical analysis, and validating the superior performance of this framework in terms of accuracy, privacy protection, and computational cost savings through simulation experiments on benchmark datasets.