Abstract:Private Set Intersection (PSI) is typically used to achieve ID alignment with protection of IDs in the preparation phase of Vertical Federated Learning (VFL). However, existing PSI approaches are limited to protecting IDs that are outside the intersection of participants, and most ignore the sensitivity of intersection for a weak party in an asymmetrical ID alignment. Since the set size of the strong party is much greater than the weak party's in an asymmetrical federation, and the intersection usually accounts for a substantial part of the weak party set, the weak party's sensitive sample IDs would be severely compromised through sharing the intersection. To address this issue, we propose Differentially private PSI Cardinality and PSI (DPSI-CA, DPSI) protocols, which protect the intersection cardinality and sensitive IDs inside the intersection for the weak party, respectively. First, DPSI-CA encodes IDs in binary notation, and combines them with the GM encryption, to perform the ID-matchmaking by executing bitwise plaintext XOR. Then, the encrypted matching results are independently perturbed using randomized responses to produce differentially private outputs for PSI-CA, and its unbiased estimate is added to remove the deviation brought by the randomization. Furthermore, DPSI fuses Pseudo-Random Function (PRF)-based zero sharing, garbled Bloom filter, and Oblivious PRF (OPRF)-based shares reconstruction, to successfully reconstruct the shares corresponding to sampled IDs in the intersection. Meanwhile, a randomized response is used to sample the inputs and perturb the outputs of the OPRF-based shares reconstruction, producing a randomly sampled intersection for the weak party and differentially private intersection for the strong party. Finally, the privacy analysis shows that our protocols provide differential privacy for the weak party's sensitive sample IDs, and extensive experiment results illustrate the feasibility of the asymmetrical ID alignment involving millions of IDs.

Efficient Private Multiset ID Protocols.

Iprivjoin: an ID-Private Data Join Framework for Privacy-Preserving Machine Learning.

Private Set Intersection for Unequal Set Sizes with Mobile Applications.

PID-Comm: A Fast and Flexible Collective Communication Framework for Commodity Processing-in-DIMM Devices

An Effective and Differentially Private Protocol for Secure Distributed Cardinality Estimation

Unique Information and Secret Key Agreement

A More Efficient Private Set Intersection Protocol Based on Random OT and Balance Hash

Efficient Generalized Selective Private Function Evaluation With Applications In Biometric Authentication

Differentially Private Set Intersection for Asymmetrical ID Alignment

Not Just Summing: The Identifier Leakage of Private-Join-and-Compute and Its Improvement

Privacy Protection Based on Special Identifiers of Intersection Base Computing Technology

Fundamental Limits of Multi-Message Private Computation

Efficient multi-party private set intersection protocols for large participants and small sets

Nearly Optimal Protocols for Computing Multi-party Private Set Union

P²FRPSI: Privacy-Preserving Feature Retrieved Private Set Intersection

t-PSI: Efficient Multi-party Private Set Intersection with Threshold.

Incentive and Unconditionally Anonymous Identity-Based Public Provable Data Possession.

A Framework of Private Set Intersection Protocols

Lightweight Threshold Private Set Intersection Via Oblivious Transfer

A Secure and Lightweight Multi-Party Private Intersection-Sum Scheme over a Symmetric Cryptosystem

Multi-Server Weakly-Private Information Retrieval