Enhancing Security and Privacy in Federated Learning using Update Digests and Voting-Based Defense

Wenjie Li,Kai Fan,Jingyuan Zhang,Hui Li,Wei Yang Bryan Lim,Qiang Yang
2024-05-29
Abstract:Federated Learning (FL) is a promising privacy-preserving machine learning paradigm that allows data owners to collaboratively train models while keeping their data localized. Despite its potential, FL faces challenges related to the trustworthiness of both clients and servers, especially in the presence of curious or malicious adversaries. In this paper, we introduce a novel framework named \underline{\textbf{F}}ederated \underline{\textbf{L}}earning with \underline{\textbf{U}}pdate \underline{\textbf{D}}igest (FLUD), which addresses the critical issues of privacy preservation and resistance to Byzantine attacks within distributed learning environments. FLUD utilizes an innovative approach, the $\mathsf{LinfSample}$ method, allowing clients to compute the $l_{\infty}$ norm across sliding windows of updates as an update digest. This digest enables the server to calculate a shared distance matrix, significantly reducing the overhead associated with Secure Multi-Party Computation (SMPC) by three orders of magnitude while effectively distinguishing between benign and malicious updates. Additionally, FLUD integrates a privacy-preserving, voting-based defense mechanism that employs optimized SMPC protocols to minimize communication rounds. Our comprehensive experiments demonstrate FLUD's effectiveness in countering Byzantine adversaries while incurring low communication and runtime overhead. FLUD offers a scalable framework for secure and reliable FL in distributed environments, facilitating its application in scenarios requiring robust data management and security.
Cryptography and Security,Artificial Intelligence
What problem does this paper attempt to address?
This paper attempts to solve two key problems in Federated Learning (FL): privacy protection and Byzantine fault tolerance. ### 1. Privacy Protection In Federated Learning, although data remains local, the server can still infer sensitive information by analyzing the model updates uploaded by clients. For example, through gradient inversion attacks (such as DLG [14] and InvertGrad [16]), an attacker can use gradient information to reconstruct the original data. Therefore, how to ensure that the updates uploaded by clients do not leak privacy has become an important issue. ### 2. Byzantine Fault Tolerance Federated Learning is threatened by malicious clients or servers. These malicious entities may tamper with local data or modify the training process to generate harmful model updates (i.e., Byzantine attacks). These attacks may cause the global model to deviate from the correct training direction and even embed persistent backdoors. Common Byzantine attacks include: - **Untargeted Attacks**: such as LabelFlipping, SignFlipping, Noise - (0,1), etc. - **Targeted Attacks / Backdoor Attacks**: such as ALIE, MinMax, IPM - 0.1, IPM - 100, etc. To address these problems, the paper proposes a new framework - **Federated Learning with Update Digest (FLUD)**. The core innovations of this framework include: #### A. Update Digest Calculation Method (LinfSample) FLUD introduces the LinfSample method, which allows clients to calculate the \(\ell_\infty\) norm of model updates as an update digest through a sliding window. The specific formula is as follows: \[ d_{i,j} = \max_{k = 0,1,\ldots,s - 1} |f_{i,s\cdot i + k}|, \quad j = 1,2,\ldots,l' \] Here, \(f_i\) is the model update vector of client \(i\), \(s\) is the window size, and \(l'\) is the length of the update digest. This method enables the server to effectively identify malicious updates without accessing the complete update. #### B. Privacy - Protected Voting Defense Mechanism FLUD integrates a privacy - protected voting mechanism, which minimizes the number of communication rounds through an optimized Secure Multi - Party Computation (SMPC) protocol. Specifically, the server calculates the voting matrix based on the shared distance matrix and screens out malicious updates by calculating the median of each row in parallel. This not only improves security but also significantly reduces computational overhead. ### Summary The FLUD framework effectively solves the privacy protection and Byzantine fault tolerance problems in Federated Learning by introducing the update digest and voting mechanism. Experimental results show that FLUD has low communication and runtime overhead while resisting Byzantine attacks, and is suitable for distributed environments that require robust data management and security. References: - DLG: Deep Leakage from Gradients [14] - InvertGrad: Inverting Gradients - How Easy is it to Break Privacy in Federated Learning? [16]