Provable Privacy Attacks on Trained Shallow Neural Networks

Guy Smorodinsky,Gal Vardi,Itay Safran
2024-10-10
Abstract:We study what provable privacy attacks can be shown on trained, 2-layer ReLU neural networks. We explore two types of attacks; data reconstruction attacks, and membership inference attacks. We prove that theoretical results on the implicit bias of 2-layer neural networks can be used to provably reconstruct a set of which at least a constant fraction are training points in a univariate setting, and can also be used to identify with high probability whether a given point was used in the training set in a high dimensional setting. To the best of our knowledge, our work is the first to show provable vulnerabilities in this setting.
Machine Learning,Cryptography and Security
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the privacy vulnerabilities of trained shallow neural networks (especially two - layer ReLU neural networks). Specifically, the authors studied two types of attacks: data reconstruction attacks and membership inference attacks. The core of these problems lies in understanding and proving whether a trained neural network may leak sensitive information of its training data. ### Background and Problem Description of the Paper In recent years, research has shown that theoretical tools can be used to reconstruct part of the training data set from a trained neural network. These attacks take advantage of the implicit bias of the training algorithm, that is, the neural network tends to converge to certain specific solutions during the training process, so that information about the training set can be extracted. However, despite a large number of empirical studies, previous work has not provided a strict theoretical explanation for why these reconstructions are possible. In this paper, the authors attempt for the first time to strictly prove the possibility of these attacks theoretically and show how an attacker can successfully carry out these attacks under different assumptions. Specifically: 1. **Data Reconstruction Attack**: The authors prove that in the univariate case, an attacker can reconstruct a part of the training data with a constant probability under certain conditions. 2. **Membership Inference Attack**: The authors prove that in the high - dimensional case, an attacker can determine whether a given point belongs to the training set with a high success rate by analyzing the output values of the neural network. ### Main Contributions - **Data Reconstruction in the Univariate Case**: Under the condition that Assumption 2.1 is satisfied, an attacker can reconstruct a part of the training data, and the proportion of this part of data is constant and has nothing to do with the size of the training set. - **Membership Inference Attack in the High - Dimensional Case**: Under the condition that Assumption 4.1 is satisfied, an attacker can carry out a membership inference attack, and in actual experiments, even if some assumptions are relaxed, the attack is still effective. - **Experimental Evidence**: The above theoretical results are verified by experiments, indicating that these vulnerabilities may exist in a wider range of situations. ### Related Work - **Data Reconstruction Attack**: Previous research mainly focused on data reconstruction attacks in generative models (such as large - language models, diffusion models, etc.) and in the federated learning environment. This paper focuses on using implicit bias for strict theoretical proof. - **Membership Inference Attack**: This type of attack aims to determine whether a data point belongs to the training set, taking advantage of the behavioral differences of the training model on the training data. This paper provides a membership inference attack method with provable success rate. ### Conclusion This paper reveals the privacy vulnerabilities in trained shallow neural networks through strict theoretical analysis and experimental evidence. Although these attacks hold under certain assumptions, experiments show that they may also be effective in a wider range of scenarios. The authors hope that this work can provide a theoretical basis for future privacy attack and defense research and inspire more in - depth research on the privacy of neural networks.