Model Inversion Attacks: A Survey of Approaches and Countermeasures

Zhanke Zhou,Jianing Zhu,Fengfei Yu,Xuan Li,Xiong Peng,Tongliang Liu,Bo Han
2024-11-15
Abstract:The success of deep neural networks has driven numerous research studies and applications from Euclidean to non-Euclidean data. However, there are increasing concerns about privacy leakage, as these networks rely on processing private data. Recently, a new type of privacy attack, the model inversion attacks (MIAs), aims to extract sensitive features of private data for training by abusing access to a well-trained model. The effectiveness of MIAs has been demonstrated in various domains, including images, texts, and graphs. These attacks highlight the vulnerability of neural networks and raise awareness about the risk of privacy leakage within the research community. Despite the significance, there is a lack of systematic studies that provide a comprehensive overview and deeper insights into MIAs across different domains. This survey aims to summarize up-to-date MIA methods in both attacks and defenses, highlighting their contributions and limitations, underlying modeling principles, optimization challenges, and future directions. We hope this survey bridges the gap in the literature and facilitates future research in this critical area. Besides, we are maintaining a repository to keep track of relevant research at <a class="link-external link-https" href="https://github.com/AndrewZhou924/Awesome-model-inversion-attack" rel="external noopener nofollow">this https URL</a>.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the privacy leakage risk brought by model inversion attacks (MIAs). Specifically, with the successful application of deep neural networks, more and more research and practical applications rely on processing private data, which has raised concerns about privacy leakage. MIAs are a new type of privacy attack. Attackers attempt to extract sensitive features of private data used for training by abusing the ability to access well - trained models. This attack has been verified in multiple fields such as images, texts, and graphs, highlighting the vulnerability of neural networks and attracting the attention of the research community to the privacy leakage risk. The main objectives of the paper include: 1. **Systematically summarize existing methods**: Provide a comprehensive review, covering the attack and defense methods of MIAs, highlighting their contributions and limitations. 2. **Explain the modeling principles**: Elaborate on the underlying modeling principles and optimization challenges of MIAs in detail. 3. **Explore future directions**: Discuss current challenges and future research directions to promote further development in this critical area. ### Core issues of the paper The core issue of the paper is to explore how to deal with the privacy leakage risk brought by model inversion attacks. Specifically: - **Define model inversion attacks**: Formally define MIAs, and clarify their attack goals and defense goals. - **Classification and analysis**: Classify MIAs according to different data domains (such as images, texts, and graphs), and analyze the effectiveness and limitations of each method. - **Optimization strategies**: Propose general principles for enhancing MIAs or defending against MIAs, such as improving query strategies, using internal model information, and using output probabilities. - **Comparison and contrast**: Compare MIAs with other types of privacy attacks (such as model stealing attacks, membership inference attacks, and gradient inversion attacks), and clarify the relationships and differences between them. ### Formula representation In MIAs, given a trained model \( f_{\theta} \) and prior knowledge \( K \), the attacker's purpose is to find a reverse hypothesis \( f^{-1}_{\phi} \) to recover the training data \( X_{\text{train}} \). That is: \[ f^{-1}_{\phi}(f_{\theta}, K)=\hat{X}_{\text{train}} \] Among them, the recovered data \( \hat{X}_{\text{train}} \) is a set of data samples, and it is expected that these samples can approximate the samples in \( X_{\text{train}} \). The goal of the attack is to make the recovered data as close as possible to the original training data, that is, to minimize the distance: \[ \min_{\hat{X}_{\text{train}}} d(\hat{X}_{\text{train}}, X_{\text{train}}) \] And the goal of defense is to maximize this distance while maintaining the performance of the model on the test set: \[ \max_{\hat{X}_{\text{train}}} d(\hat{X}_{\text{train}}, X_{\text{train}}) \] ### Summary This paper aims to help the academic and industrial communities better understand the threat of this attack through a comprehensive review of MIAs and provide guidance for future privacy - protection research. By systematically summarizing existing attack and defense methods, the paper hopes to bridge the gaps in the literature and promote further development in this important area.