Defending Against Similarity Shift Attack for EaaS Via Adaptive Multi-Target Watermarking

Zuopeng Yang,Pengyu Chen,Tao Li,Kangjun Liu,Yuan Huang,Xin Lin
DOI: https://doi.org/10.1016/j.ins.2024.120893
IF: 8.1
2024-01-01
Information Sciences
Abstract:Large language models have revolutionized natural language processing, leading to the emergence of Embedding as a Service (EaaS). While EaaS facilitates access to advanced embedding models, it also presents challenges in copyright protection. Current research primarily relies on single-target watermarking frameworks, where a predefined vector is integrated as a watermark into text embeddings. However, these approaches are vulnerable to watermark information leakage. To investigate this issue, we introduce the Embedding Similarity Shift Attack (ESSA), an innovative attack algorithm designed to detect trigger instances in single-target watermarking systems by analyzing similarity shifts among constructed reference sentence pairs. Additionally, to defend against such an attack, we propose Adaptive Multi-Target Watermarking (AMT-WM). AMT-WM stands as the pioneering multi-target watermarking method aimed at safeguarding the copyright of EaaS. Specifically, AMT-WM constructs multiple watermarks through the utilization of orthogonal vectors to mitigate selection bias towards a particular vector. Furthermore, it incorporates a randomly selected sentence embedding as the base embedding to enhance the confidentiality of backdoored embeddings. For multi-target watermarking, we implement adaptive watermark injection and validation based on similarity. Comprehensive experiments conducted on various datasets validate the effectiveness of ESSA in trigger detection performance and the efficacy of AMT-WM in copyright protection. Our code will be available soon.
What problem does this paper attempt to address?