SoK: Gradient Leakage in Federated Learning

Jiacheng Du,Jiahui Hu,Zhibo Wang,Peng Sun,Neil Zhenqiang Gong,Kui Ren
2024-04-08
Abstract:Federated learning (FL) enables collaborative model training among multiple clients without raw data exposure. However, recent studies have shown that clients' private training data can be reconstructed from the gradients they share in FL, known as gradient inversion attacks (GIAs). While GIAs have demonstrated effectiveness under \emph{ideal settings and auxiliary assumptions}, their actual efficacy against \emph{practical FL systems} remains under-explored. To address this gap, we conduct a comprehensive study on GIAs in this work. We start with a survey of GIAs that establishes a milestone to trace their evolution and develops a systematization to uncover their inherent threats. Specifically, we categorize the auxiliary assumptions used by existing GIAs based on their practical accessibility to potential adversaries. To facilitate deeper analysis, we highlight the challenges that GIAs face in practical FL systems from three perspectives: \textit{local training}, \textit{model}, and \textit{post-processing}. We then perform extensive theoretical and empirical evaluations of state-of-the-art GIAs across diverse settings, utilizing eight datasets and thirteen models. Our findings indicate that GIAs have inherent limitations when reconstructing data under practical local training settings. Furthermore, their efficacy is sensitive to the trained model, and even simple post-processing measures applied to gradients can be effective defenses. Overall, our work provides crucial insights into the limited effectiveness of GIAs in practical FL systems. By rectifying prior misconceptions, we hope to inspire more accurate and realistic investigations on this topic.
Cryptography and Security,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is related to the risk of gradient leakage in Federated Learning (FL), especially the effectiveness and threat of Gradient Inversion Attacks (GIAs) in actual federated learning systems. Specifically: 1. **Background and Problem**: - Federated learning allows multiple clients to collaborate in training a model without exposing the original data. - Recent research has shown that the gradients shared by clients can be used to reconstruct their private training data, and this kind of attack is called Gradient Inversion Attacks (GIAs). - Existing GIAs research is usually carried out under ideal conditions and relies on some auxiliary assumptions, but the effectiveness of these attacks in actual federated learning systems has not been fully explored. 2. **Research Objectives**: - This paper aims to evaluate the real threats and limitations of GIAs in actual federated learning systems through comprehensive research. - Specifically, the author hopes to answer the following key questions: - **RQ1**: How does the training configuration of clients affect the data reconstruction ability of GIAs? - **RQ2**: How do GIAs perform when dealing with high - dimensional data? - **RQ3**: Are GIAs still effective when reconstructing data with diverse content? - **RQ4**: What is the impact of models in different federated learning training stages on GIAs? - **RQ5**: How does the model architecture affect its resistance to GIAs? - **RQ6**: Can gradient post - processing techniques naturally defend against GIAs while ensuring the utility of the model? 3. **Methods and Contributions**: - **Systematizing GIAs**: The author systematically classifies and summarizes the existing GIAs based on three dimensions (threat model, attack method, and defense measures), and classifies the auxiliary assumptions according to their actual accessibility. - **Re - examining the Performance of GIAs in Actual FL**: Analyzes the challenges faced by GIAs in actual federated learning systems from three perspectives: local training, model, and post - processing. - **Theoretical and Empirical Analysis**: Evaluates the performance of existing GIAs in different settings through extensive experiments on eight datasets and thirteen models. 4. **Main Findings**: - **Data Reconstruction Bottleneck**: As the number of local training rounds increases and the data dimension rises, it becomes more difficult for GIAs to reconstruct data. - **Model Sensitivity**: GIAs are very sensitive to the training stage and architecture of the model, and are more likely to succeed especially in the early training stage. - **Effectiveness of Post - Processing**: Simple gradient post - processing techniques (such as quantization and sparsification) can effectively defend against GIAs while maintaining the accuracy of the model. In conclusion, through in - depth analysis and empirical research, this paper reveals the limitations and inefficiency of GIAs in actual federated learning systems, providing important insights for more accurate and realistic research in the future.