Abstract:Federated learning (FL) has emerged as a privacy-preserving paradigm that trains neural networks on edge devices without collecting data at a central server. However, FL encounters an inherent challenge in dealing with non-independent and identically distributed (non-IID) data among devices. To address this challenge, this paper proposes a hard feature matching data synthesis (HFMDS) method to share auxiliary data besides local models. Specifically, synthetic data are generated by learning the essential class-relevant features of real samples and discarding the redundant features, which helps to effectively tackle the non-IID issue. For better privacy preservation, we propose a hard feature augmentation method to transfer real features towards the decision boundary, with which the synthetic data not only improve the model generalization but also erase the information of real features. By integrating the proposed HFMDS method with FL, we present a novel FL framework with data augmentation to relieve data heterogeneity. The theoretical analysis highlights the effectiveness of our proposed data synthesis method in solving the non-IID challenge. Simulation results further demonstrate that our proposed HFMDS-FL algorithm outperforms the baselines in terms of accuracy, privacy preservation, and computational cost on various benchmark datasets.

What problem does this paper attempt to address?

The paper attempts to address the issue of handling non-independent and identically distributed (non-IID) data in Federated Learning (FL). Specifically, Federated Learning is a privacy-preserving method for training neural networks on edge devices without collecting data to a central server. However, an inherent problem faced by Federated Learning is how to handle non-IID data across different devices. This data heterogeneity can lead to biased local updates, thereby affecting the performance and accuracy of the global model. To tackle this challenge, the paper proposes a Hard Feature Matching Data Synthesis (HFMDS) method, which generates auxiliary data to supplement local models and mitigate the non-IID problem. This method generates synthetic data by learning key class-related features of real samples and discarding redundant features, effectively addressing the non-IID issue. Additionally, to better protect privacy, the paper introduces a hard feature enhancement method that shifts real features towards the decision boundary, making the synthetic data not only improve the model's generalization ability but also erase information of real features. By combining the HFMDS method with Federated Learning, the paper proposes a new Federated Learning framework that alleviates data heterogeneity through data augmentation. The main contributions of the paper include: 1. Proposing a Feature Matching Data Synthesis (FMDS) method based on class-related feature matching, which reduces the training overhead of data synthesis and generates effective synthetic data. 2. Further proposing the Hard Feature Matching Data Synthesis (HFMDS) method, which enhances privacy protection and improves the effectiveness of synthetic data by shifting real features towards the decision boundary. 3. Combining the HFMDS method with Federated Learning to propose a new Federated Learning algorithm (HFMDS-FL), where each client can generate hard feature matching synthetic data and share it with other clients, thereby alleviating data heterogeneity among clients. 4. Demonstrating the effectiveness of HFMDS-FL in feature alignment and domain adaptation through visualization results and theoretical analysis, and validating the superior performance of this framework in terms of accuracy, privacy protection, and computational cost savings through simulation experiments on benchmark datasets.

Feature Matching Data Synthesis for Non-IID Federated Learning

Feature Matching Data Synthesis for Non-IID Federated Learning

Fed-FSNet: Mitigating Non-I.I.D. Federated Learning via Fuzzy Synthesizing Network

Federated Learning with GAN-based Data Synthesis for Non-IID Clients.

GFL: Federated Learning on Non-IID data via Privacy-preserving Synthetic data

Optimizing Federated Learning on Non-IID Data Using Local Shapley Value.

Synthetic Data Aided Federated Learning Using Foundation Models

Privacy-Enhanced Federated Learning for Non-IID Data

A state-of-the-art survey on solving non-IID data in Federated Learning

Communication-Efficient Federated Data Augmentation on Non-IID Data

FedSea: Federated Learning Via Selective Feature Alignment for Non-IID Multimodal Data

FedFed: Feature Distillation Against Data Heterogeneity in Federated Learning

A Survey of Federated Learning on Non-IID Data

Federated Learning for Non-IID Data Via Unified Feature Learning and Optimization Objective Alignment

Towards Fast and Accurate Federated Learning with Non-Iid Data for Cloud-Based IoT Applications

Data Augmentation Scheme for Federated Learning with Non-Iid Data

MDFL: Model-Distance Federated Learning on Non-IID Data

Federated Learning with Non-IID Data: A Survey

Completely Heterogeneous Federated Learning

Data Collaborative Federated Learning for Non-i.i.d Data in Wireless Networks

Semi-Supervised Federated Learning with Non-Iid Data: Algorithm and System Design