One Prompt to Verify Your Models: Black-Box Text-to-Image Models Verification via Non-Transferable Adversarial Attacks

Ji Guo,Wenbo Jiang,Rui Zhang,Guoming Lu,Hongwei Li
2024-10-31
Abstract:Recently, the success of Text-to-Image (T2I) models has led to the rise of numerous third-party platforms, which claim to provide cheaper API services and more flexibility in model options. However, this also raises a new security concern: Are these third-party services truly offering the models they claim? To address this problem, we propose the first T2I model verification method named Text-to-Image Model Verification via Non-Transferable Adversarial Attacks (TVN). The non-transferability of adversarial examples means that these examples are only effective on a target model and ineffective on other models, thereby allowing for the verification of the target model. TVN utilizes the Non-dominated Sorting Genetic Algorithm II (NSGA-II) to optimize the cosine similarity of a prompt's text encoding, generating non-transferable adversarial prompts. By calculating the CLIP-text scores between the non-transferable adversarial prompts without perturbations and the images, we can verify if the model matches the claimed target model, based on a 3-sigma threshold. The experiments showed that TVN performed well in both closed-set and open-set scenarios, achieving a verification accuracy of over 90\%. Moreover, the adversarial prompts generated by TVN significantly reduced the CLIP-text scores of the target model, while having little effect on other models.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of verifying text - to - image (T2I) models in a black - box environment. Specifically, with the successful application of T2I models, many third - party platforms claim to provide cheaper API services and more diverse model choices. However, this has led to new security issues: do these third - party services actually provide the models they claim? #### Background problems 1. **Security of third - party platforms**: Some third - party platforms may claim to provide an expensive model (such as DALL - E 3), but actually provide a lower - cost model (such as Stable Diffusion v1.4). This behavior may lead to illegal profits and damage the rights and interests of users. 2. **Limitations of existing methods**: Existing model verification methods mainly focus on large language models (LLMs). They identify the model version by sending carefully designed queries and analyzing the responses. However, these methods are not applicable to T2I models because T2I models output images rather than text and cannot directly convey information about themselves. #### Solutions To solve the above problems, the author proposes the first method for verifying T2I models, called Text - to - Image Models Verification via Non - Transferable Adversarial Attacks (TVN). The main idea of TVN is: - **Generate non - transferable adversarial samples**: Optimize specific perturbations so that the adversarial samples are only effective for the target model and ineffective for other models. - **Calculate CLIP - text scores**: Calculate the CLIP - text score between the generated image and the original prompt, and determine whether the model is the target model according to the 3 - sigma threshold. #### Specific implementation - **NSGA - II optimization algorithm**: Use the Non - dominated Sorting Genetic Algorithm II (NSGA - II) to optimize the adversarial samples to ensure their non - transferability. - **Evaluation metrics**: Evaluate the effectiveness of TVN through metrics such as CLIP - text scores, accuracy, precision, recall, and F1 - Score. #### Experimental results - **Closed - set scenario**: TVN performs well in the closed - set scenario. The CLIP - text score for the target model is significantly reduced, while having a relatively small impact on other models. - **Open - set scenario**: TVN also performs well in the open - set scenario and can effectively distinguish the target model from other models. In conclusion, this paper proposes an innovative method to verify T2I models in a black - box environment, solves the possible fraud problems of current third - party platforms, and provides an effective solution for practical applications.