Robust Federated Learning Mitigates Client-side Training Data Distribution Inference Attacks

Yichang Xu,Ming Yin,Minghong Fang,Neil Zhenqiang Gong
2024-04-04
Abstract:Recent studies have revealed that federated learning (FL), once considered secure due to clients not sharing their private data with the server, is vulnerable to attacks such as client-side training data distribution inference, where a malicious client can recreate the victim's data. While various countermeasures exist, they are not practical, often assuming server access to some training data or knowledge of label distribution before the attack.
Cryptography and Security,Distributed, Parallel, and Cluster Computing,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the threat to system security posed by client - side training data distribution inference attacks in Federated Learning (FL). Specifically, this type of attack allows malicious clients to reconstruct the data distribution of other clients by analyzing model updates, thereby leaking sensitive information. Although some defense measures have been proposed in previous studies, these methods often assume that the server can access certain training data or know the label distribution before the attack, which is not realistic in practical applications. To address this challenge, the authors propose a novel Byzantine - robust aggregation rule named InferGuard. The core idea of InferGuard is to calculate the coordinate - wise median of all client model updates, and then identify those model updates that deviate significantly from the median and mark them as potential malicious updates. This method aims to effectively resist client - side training data distribution inference attacks without relying on additional data or prior knowledge, while maintaining the practicality and efficiency of the Federated Learning system. The paper proves the effectiveness of InferGuard through extensive experimental evaluations on five benchmark datasets, especially showing excellent performance in combating strong adaptive attacks, and significantly outperforming existing baseline methods in a variety of practical Federated Learning scenarios.