Secure and Verifiable Data Collaboration with Low-Cost Zero-Knowledge Proofs
Yizheng Zhu,Yuncheng Wu,Zhaojing Luo,Beng Chin Ooi,Xiaokui Xiao
DOI: https://doi.org/10.14778/3665844.3665860
IF: 2.5
2024-05-01
Proceedings of the VLDB Endowment
Abstract:Federated Learning (FL) emerges as a viable solution to facilitate data collaboration, enabling multiple clients to collaboratively train a machine learning (ML) model under the supervision of a central server while ensuring the confidentiality of their raw data. However, existing studies have unveiled two main risks: (i) the potential for the server to infer sensitive information from the client's uploaded updates (i.e., model gradients), compromising client input privacy, and (ii) the risk of malicious clients uploading malformed updates to poison the FL model, compromising input integrity. Recent works utilize secure aggregation with zero-knowledge proofs (ZKP) to guarantee input privacy and integrity in FL. Nevertheless, they suffer from extremely low efficiency and, thus, are impractical for real deployment. In this paper, we propose a novel and highly efficient approach RiseFL for secure and verifiable data collaboration, ensuring input privacy and integrity simultaneously. Firstly, we devise a probabilistic integrity check method that transforms strict checks into a hypothesis test problem, offering great optimization opportunities. Secondly, we introduce a hybrid commitment scheme to satisfy Byzantine robustness with improved performance. Thirdly, we present an optimized ZKP generation and verification technique that significantly reduces the ZKP cost based on probabilistic integrity checks. Furthermore, we theoretically prove the security guarantee of RiseFL and provide a cost analysis compared to state-of-the-art baselines. Extensive experiments on synthetic and real-world datasets suggest that our approach is effective and highly efficient in both client computation and communication. For instance, RiseFL is up to 28x, 53x, and 164x faster than baselines ACORN, RoFL, and EIFFeL for the client computation.
computer science, information systems, theory & methods