VPAS: Publicly Verifiable and Privacy-Preserving Aggregate Statistics on Distributed Datasets

Mohammed Alghazwi,Dewi Davies-Batista,Dimka Karastoyanova,Fatih Turkmen
2024-03-22
Abstract:Aggregate statistics play an important role in extracting meaningful insights from distributed data while preserving privacy. A growing number of application domains, such as healthcare, utilize these statistics in advancing research and improving patient care.
Cryptography and Security
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is: when performing aggregate statistics on distributed datasets, how to ensure input validation and public verifiability while protecting privacy. Specifically, the paper proposes solutions to the following two core problems: 1. **Input Validation**: - In the actual process of deploying secure aggregation, in the face of potentially compromised clients, it is necessary to ensure the validity of the input data and the correctness of the calculation results. This involves preventing clients from manipulating the results by inserting malicious or malformed data. - The paper proposes to use zero - knowledge proofs (ZKP), especially general - purpose zero - knowledge succinct non - interactive arguments of knowledge (zkSNARK), to perform more extensive verification tasks, rather than just verifying whether the input is within a certain range. 2. **Public Verifiability**: - When aggregate statistics are used in critical areas (such as healthcare), ensuring their verifiability, especially public verifiability, is crucial. Public verifiability allows any third party (including non - participants) to confirm the correctness of the calculation results according to the protocol specification without accessing the underlying data. - By applying public verifiability, the need to re - run the calculation can be eliminated, and the auditing process can be promoted by recording data usage, calculation functions, and result distribution, thus promoting responsible data sharing. To achieve these goals, the authors propose a protocol named VPAS (Publicly Verifiable and Privacy - Preserving Aggregate Statistics). This protocol uses homomorphic encryption to protect data privacy and ZKP and blockchain systems for input validation and public verifiability. VPAS constructs a lightweight protocol by extending existing verifiable encryption schemes, enabling \(N\) clients to encrypt, aggregate, and finally publish the results to the collector in a verifiable manner. In addition, the paper also shows an application case of VPAS in genomics research, especially in genome - wide association studies (GWAS), how to ensure input validation, calculation correctness, and public verifiability while protecting privacy. Experimental evaluations show that VPAS improves the performance by 10 times compared to the traditional zkSNARK method, making it possible to achieve input validation and public verifiability in a wider range of application scenarios with a moderate computational cost.