Just a Simple Transformation is Enough for Data Protection in Vertical Federated Learning

Andrei Semenov,Philip Zmushko,Alexander Pichugin,Aleksandr Beznosikov
2024-12-16
Abstract:Vertical Federated Learning (VFL) aims to enable collaborative training of deep learning models while maintaining privacy protection. However, the VFL procedure still has components that are vulnerable to attacks by malicious parties. In our work, we consider feature reconstruction attacks, a common risk targeting input data compromise. We theoretically claim that feature reconstruction attacks cannot succeed without knowledge of the prior distribution on data. Consequently, we demonstrate that even simple model architecture transformations can significantly impact the protection of input data during VFL. Confirming these findings with experimental results, we show that MLP-based models are resistant to state-of-the-art feature reconstruction attacks.
Machine Learning,Cryptography and Security
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to prevent feature reconstruction attacks in Vertical Federated Learning (VFL). Specifically, the author focuses on the situation in Split Learning (SL) scenarios, where malicious servers infer and reconstruct clients' private data by accessing the model architecture or auxiliary data sets. These problems mainly include: 1. **Model Inversion Attacks (MI)**: Attackers use their access to the model architecture to try to reverse - engineer the original input data from the activation values. 2. **Feature - space Hijacking Attack (FHSA)**: Attackers use a public data set with the same distribution as the training data to guide the client model to learn a specific hidden space, thereby reconstructing private features. ### Main problems of the paper - **Effectiveness of feature reconstruction attacks**: Do existing feature reconstruction attacks (such as MI and FHSA) really rely on prior knowledge of the data distribution? - **Impact of model architecture**: Will different model architectures (such as MLP vs CNN) affect the success rate of these attacks? - **Theoretical basis of privacy protection**: Can it be theoretically proven that some simple model transformations can significantly improve data security? ### Solutions The author solves the above problems in the following ways: 1. **Theoretical analysis**: - It is proven that without prior knowledge of the data distribution, feature reconstruction attacks cannot succeed. - It is pointed out that even simple model architecture transformations (such as using MLP instead of CNN) can significantly improve data security. 2. **Experimental verification**: - Experiments show that the MLP - based model is highly resistant to the state - of - the - art feature reconstruction attacks. - Using FID (Frechet inception distance) as an evaluation metric, it shows the superiority of the MLP model in image quality reconstruction. ### Formula representation To ensure the correctness and readability of the formulas, the following are the key formulas involved in the paper: - **Optimization problem of UnSplit attack**: \[ \tilde{X}^*=\arg\min_{\tilde{X}}\text{LMSE}(\tilde{f}(\tilde{X},\tilde{W}), f(X, W))+\lambda\text{TV}(\tilde{X}) \] \[ \tilde{W}^*=\arg\min_{\tilde{W}}\text{LMSE}(\tilde{f}(\tilde{X},\tilde{W}), f(X, W)) \] - **Optimization problem of FHSA attack**: \[ \psi_E^*,\psi_D^*=\arg\min_{\psi_E,\psi_D}\text{LMSE}(\psi_D(\psi_E(X_{pub})), X_{pub}) \] \[ D = \arg\min_D[\log(1 - D(\psi_E(X_{pub})))+\log(D(f(X)))] \] \[ L^*=\arg\min_f[\log(1 - D(f(X)))] \] ### Summary Through theoretical analysis and experiments, this paper proves that in VFL, simple model architecture transformations (such as using MLP) can effectively prevent feature reconstruction attacks without the need for an additional defense framework. This provides new ideas and methods for improving the security of VFL systems.