Efficient Byzantine-Robust and Provably Privacy-Preserving Federated Learning

Chenfei Nie,Qiang Li,Yuxin Yang,Yuede Ji,Binghui Wang
2024-07-29
Abstract:Federated learning (FL) is an emerging distributed learning paradigm without sharing participating clients' private data. However, existing works show that FL is vulnerable to both Byzantine (security) attacks and data reconstruction (privacy) attacks. Almost all the existing FL defenses only address one of the two attacks. A few defenses address the two attacks, but they are not efficient and effective enough. We propose BPFL, an efficient Byzantine-robust and provably privacy-preserving FL method that addresses all the issues. Specifically, we draw on state-of-the-art Byzantine-robust FL methods and use similarity metrics to measure the robustness of each participating client in FL. The validity of clients are formulated as circuit constraints on similarity metrics and verified via a zero-knowledge proof. Moreover, the client models are masked by a shared random vector, which is generated based on homomorphic encryption. In doing so, the server receives the masked client models rather than the true ones, which are proven to be private. BPFL is also efficient due to the usage of non-interactive zero-knowledge proof. Experimental results on various datasets show that our BPFL is efficient, Byzantine-robust, and privacy-preserving.
Cryptography and Security
What problem does this paper attempt to address?
The paper primarily addresses two key issues in Federated Learning (FL) — Byzantine attacks (security issue) and data reconstruction attacks (privacy issue) — and proposes a novel solution. The proposed method in the paper is called BPFL (Byzantine-robust and Privacy-Preserving Federated Learning), aiming to achieve the following three goals: 1. **Byzantine Robustness**: The server can detect invalid or malicious local models submitted by clients and reject these models from participating in global model aggregation. 2. **Privacy Preservation**: Throughout the federated learning training process, the server cannot infer the private data of the clients. 3. **Efficiency**: The method should not incur excessive computational and communication overhead. To achieve these goals, BPFL employs the following techniques: - Using **Zero-Knowledge Proof (ZKP)** to verify the validity of client models, determining whether the models are affected by Byzantine attacks through similarity metrics such as cosine similarity and Euclidean distance. - Utilizing **Homomorphic Encryption (HE)** to generate random vector masks, encrypting client models to prevent privacy leakage. - Designing a **Mask Vector Negotiation Protocol (MVNP)** to ensure all clients obtain the same random vector for model masking while maintaining confidentiality from the server. - Introducing hash functions to verify that the random vectors used by clients to generate proofs are indeed the true random vectors negotiated through the MVNP protocol, thus preventing malicious clients from forging proofs. The paper also details the workflow of BPFL, including the setup phase, local training phase, client proof generation and submission phase, and server proof verification and aggregation phase. Additionally, the paper analyzes the time complexity and communication complexity of BPFL and provides a theoretical security analysis. In summary, the paper aims to design a federated learning scheme that effectively defends against Byzantine attacks, protects user privacy, and is efficient by integrating existing Byzantine-robust federated learning methods, zero-knowledge proofs, and homomorphic encryption techniques.