Defending Batch-Level Label Inference and Replacement Attacks in Vertical Federated Learning

Tianyuan Zou,Yang Liu,Yan Kang,Wenhan Liu,Yuanqin He,Zhihao Yi,Qiang Yang,Ya-Qin Zhang
DOI: https://doi.org/10.1109/tbdata.2022.3192121
2022-01-01
IEEE Transactions on Big Data
Abstract:In a vertical federated learning (VFL) scenario where features and models are split into different parties, it has been shown that sample-level gradient information can be exploited to deduce crucial label information that should be kept secret. An immediate defense strategy is to protect sample-level messages communicated with Homomorphic Encryption (HE), exposing only batch-averaged local gradients to each party. In this paper, we show that even with HE-protected communication, private labels can still be reconstructed with high accuracy by gradient inversion attack, contrary to the common belief that batch-averaged information is safe to share under encryption. We then show that backdoor attack can also be conducted by directly replacing encrypted communicated messages without decryption. To tackle these attacks, we propose a novel defense method, Confusional AutoEncoder (termed CAE), which is based on autoencoder and entropy regularization to disguise true labels. To further defend attackers with sufficient prior label knowledge, we introduce DiscreteSGD-enhanced CAE (termed DCAE), and show that DCAE significantly boosts the main task accuracy than other known methods when defending various label inference attacks.
What problem does this paper attempt to address?