Secure Dataset Condensation for Privacy-Preserving and Efficient Vertical Federated Learning

Dashan Gao,Canhui Wu,Xiaojin Zhang,Xin Yao,Qiang Yang
DOI: https://doi.org/10.1007/978-3-031-70341-6_13
2024-01-01
Abstract:This work addresses the dual challenges of enhancing training efficiency and protecting data privacy in Vertical Federated Learning (VFL) through secure synthetic dataset generation. VFL typically involves an active party with labels collaborating with a passive party possessing features of the same set of samples. Traditional VFL methods, however, rely on training with entire datasets of sensitive real data, leading to two primary issues: 1) reduced training efficiency due to large dataset sizes, a concern exacerbated in cryptography-based training methods; and 2) potential privacy leakage at the sample level during training. To mitigate these issues, we introduce the Vertical Federated Dataset Condensation (VFDC) method. VFDC employs a novel mixed protection mechanism, integrating class-wise secure aggregation, differential privacy and repetitive initialization, to securely match the distributions of real and synthetic data. Empirical evaluations on six real-world datasets validate VFDC's efficacy in generating small synthetic data for VFL, achieving a superior utility-privacy-efficiency trade-off during federated training.
What problem does this paper attempt to address?