New Emerged Security and Privacy of Pre-trained Model: a Survey and Outlook

Meng Yang,Tianqing Zhu,Chi Liu,WanLei Zhou,Shui Yu,Philip S. Yu
2024-11-12
Abstract:Thanks to the explosive growth of data and the development of computational resources, it is possible to build pre-trained models that can achieve outstanding performance on various tasks, such as neural language processing, computer vision, and more. Despite their powerful capabilities, pre-trained models have also sparked attention to the emerging security challenges associated with their real-world applications. Security and privacy issues, such as leaking privacy information and generating harmful responses, have seriously undermined users' confidence in these powerful models. Concerns are growing as model performance improves dramatically. Researchers are eager to explore the unique security and privacy issues that have emerged, their distinguishing factors, and how to defend against them. However, the current literature lacks a clear taxonomy of emerging attacks and defenses for pre-trained models, which hinders a high-level and comprehensive understanding of these questions. To fill the gap, we conduct a systematical survey on the security risks of pre-trained models, proposing a taxonomy of attack and defense methods based on the accessibility of pre-trained models' input and weights in various security test scenarios. This taxonomy categorizes attacks and defenses into No-Change, Input-Change, and Model-Change approaches. With the taxonomy analysis, we capture the unique security and privacy issues of pre-trained models, categorizing and summarizing existing security issues based on their characteristics. In addition, we offer a timely and comprehensive review of each category's strengths and limitations. Our survey concludes by highlighting potential new research opportunities in the security and privacy of pre-trained models.
Artificial Intelligence
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to address the emerging security and privacy challenges faced by pre - trained models in practical applications. Although pre - trained models perform excellently in fields such as natural language processing and computer vision, they also give rise to a series of security issues, such as privacy leakage and generation of harmful responses, which seriously undermine users' trust in these powerful models. Specifically, the paper focuses on the following problems: 1. **Security and privacy issues**: - **Privacy information leakage**: For example, through membership inference attacks, an attacker can infer whether specific data belongs to the training set. - **Generation of harmful responses**: For example, through jailbreak attacks, an attacker can make the model generate harmful content by carefully designing input prompts. 2. **Lack of systematic classification**: - There is a lack of clear classification of emerging attacks and defense methods for pre - trained models in the current literature, which hinders a high - level and comprehensive understanding of these problems. 3. **Root causes of new security issues**: - The unique training strategies and large - scale data sets of pre - trained models introduce new security issues, which are different from traditional models and require in - depth exploration of their root causes. 4. **Impact of model scale**: - What unique security and privacy issues arise as the model scale increases? Why do these issues occur? How do these issues change with the change of model scale? To solve these problems, the paper proposes a new taxonomy, which classifies attack and defense methods into different categories according to the accessibility of the input and weights of the pre - trained model, and summarizes the characteristics of existing security issues and the advantages and disadvantages of defense strategies. In addition, the paper also points out new directions for future research, in the hope of establishing a higher - standard protection mechanism for pre - trained models and enhancing users' confidence in these models. ### The main contributions of the paper include: - Proposing a new taxonomy to classify current technologies based on attack and defense stages and specific strategies. - Comprehensively summarizing the latest attack and defense technologies and showing their advantages and disadvantages. - Reviewing attack and defense methods for pre - trained models of different scales, and summarizing their commonalities and differences. - In - depth discussion of open security and privacy issues in pre - trained models and pointing out possible further research directions. Through these efforts, the paper hopes to help comprehensively evaluate the security and privacy risks of pre - trained models, establish a standardized evaluation system, so as to more accurately assess risks and ultimately improve users' trust in pre - trained models.