Discovering Universal Semantic Triggers for Text-to-Image Synthesis

Shengfang Zhai, Weilong Wang, Jiajun Li, Yinpeng Dong, Hang Su, Qingni Shen
2024-02-13
Abstract:Recently text-to-image models have gained widespread attention in the community due to their controllable and high-quality generation ability. However, the robustness of such models and their potential ethical issues have not been fully explored. In this paper, we introduce Universal Semantic Trigger, a meaningless token sequence that can be added at any location within the input text yet can induce generated images towards a preset semantic target.To thoroughly investigate it, we propose Semantic Gradient-based Search (SGS) framework. SGS automatically discovers the potential universal semantic triggers based on the given semantic targets. Furthermore, we design evaluation metrics to comprehensively evaluate semantic shift of images caused by these triggers. And our empirical analyses reveal that the mainstream open-source text-to-image models are vulnerable to our triggers, which could pose significant ethical threats. Our work contributes to a further understanding of text-to-image synthesis and helps users to automatically auditing their models before deployment.
Artificial Intelligence,Cryptography and Security
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to explore and reveal a potential security threat in text - to - image generation models, namely **Universal Semantic Triggers (UST)**. Specifically, the authors focus on how to automatically discover these triggers and evaluate their impact on the generated images. #### Main problems: 1. **Security and robustness issues**: Although existing text - to - image models have made significant progress in generating high - quality images, their robustness and potential ethical issues have not been fully studied. Malicious users may use these models to generate images containing harmful or sensitive information, thus causing harm to society. 2. **Automated discovery of Hidden Vocabulary**: Previous research on "Hidden Vocabulary" mainly relies on empirical construction and lacks efficient automated methods. Therefore, the authors hope to explore an automated method to discover these hidden vocabularies, that is, universal semantic triggers. 3. **Quantitative evaluation of semantic shift**: Traditional metrics such as Attack Success Rate (ASR) may not be able to fully capture the effects of these triggers. Therefore, new evaluation metrics need to be designed to quantify the semantic changes in images caused by the triggers. #### Specific goals: - **Define and discover universal semantic triggers**: These triggers are seemingly meaningless token sequences that can be inserted at any position in the input text but can guide the generated image towards a preset semantic target. - **Develop an automated search framework**: Propose the Semantic Gradient - based Search (SGS) framework to automatically discover universal semantic triggers based on a given semantic target. - **Design a quantitative evaluation metric**: Introduce SemSR (Semantic Shift Rate) as a quantitative metric for evaluating semantic shift, which calculates the similarity between the image and the target semantics based on the multi - modal embedding space of the CLIP model. Through the above work, the authors hope to raise the community's awareness of potential threats in text - to - image models and provide users with an automatic auditing tool to ensure the safety of these models before deployment. ### Summary The core problem of this paper is to explore security vulnerabilities in text - to - image generation models, especially the existence of universal semantic triggers and their potential risks. By proposing the SGS framework and the SemSR metric, the authors provide a method for automatically discovering and evaluating these triggers to enhance the security and robustness of the models.