Differentially Private Set Intersection for Asymmetrical ID Alignment
Yuanyuan He,Xinyu Tan,Jianbing Ni,Laurence T. Yang,Xianjun Deng
DOI: https://doi.org/10.1109/tifs.2022.3207911
IF: 7.231
2022-01-01
IEEE Transactions on Information Forensics and Security
Abstract:Private Set Intersection (PSI) is typically used to achieve ID alignment with protection of IDs in the preparation phase of Vertical Federated Learning (VFL). However, existing PSI approaches are limited to protecting IDs that are outside the intersection of participants, and most ignore the sensitivity of intersection for a weak party in an asymmetrical ID alignment. Since the set size of the strong party is much greater than the weak party's in an asymmetrical federation, and the intersection usually accounts for a substantial part of the weak party set, the weak party's sensitive sample IDs would be severely compromised through sharing the intersection. To address this issue, we propose Differentially private PSI Cardinality and PSI (DPSI-CA, DPSI) protocols, which protect the intersection cardinality and sensitive IDs inside the intersection for the weak party, respectively. First, DPSI-CA encodes IDs in binary notation, and combines them with the GM encryption, to perform the ID-matchmaking by executing bitwise plaintext XOR. Then, the encrypted matching results are independently perturbed using randomized responses to produce differentially private outputs for PSI-CA, and its unbiased estimate is added to remove the deviation brought by the randomization. Furthermore, DPSI fuses Pseudo-Random Function (PRF)-based zero sharing, garbled Bloom filter, and Oblivious PRF (OPRF)-based shares reconstruction, to successfully reconstruct the shares corresponding to sampled IDs in the intersection. Meanwhile, a randomized response is used to sample the inputs and perturb the outputs of the OPRF-based shares reconstruction, producing a randomly sampled intersection for the weak party and differentially private intersection for the strong party. Finally, the privacy analysis shows that our protocols provide differential privacy for the weak party's sensitive sample IDs, and extensive experiment results illustrate the feasibility of the asymmetrical ID alignment involving millions of IDs.