Abstract:Embedding as a Service (EaaS) has become a widely adopted solution, which offers feature extraction capabilities for addressing various downstream tasks in Natural Language Processing (NLP). Prior studies have shown that EaaS can be prone to model extraction attacks; nevertheless, this concern could be mitigated by adding backdoor watermarks to the text embeddings and subsequently verifying the attack models post-publication. Through the analysis of the recent watermarking strategy for EaaS, EmbMarker, we design a novel CSE (Clustering, Selection, Elimination) attack that removes the backdoor watermark while maintaining the high utility of embeddings, indicating that the previous watermarking approach can be breached. In response to this new threat, we propose a new protocol to make the removal of watermarks more challenging by incorporating multiple possible watermark directions. Our defense approach, WARDEN, notably increases the stealthiness of watermarks and has been empirically shown to be effective against CSE attack.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of model extraction attacks in Embedding - as - a - Service (EaaS). Specifically, the paper protects EaaS from these attacks by introducing new defense mechanisms and ensures that the copyright of the embedded service is not infringed. #### Main problem background 1. **Model extraction attack**: - Attackers can collect data and train their own models by querying the API interfaces provided by EaaS, thus replicating the functions of EaaS. - This attack method enables attackers to provide competitive services with lower costs and resources, posing a threat to EaaS providers. 2. **Limitations of existing watermarking techniques**: - Existing watermarking techniques (such as EmbMarker) can add watermarks to verify copyright, but they are easily removed by attackers through specific methods (such as CSE attacks). - After removing the watermark, attackers can still use high - quality embedded services, and EaaS providers cannot detect infringement. #### Main contributions of the paper 1. **Propose the CSE attack framework**: - The CSE (Clustering, Selection, Elimination) attack framework can effectively identify and remove watermarks in embedded services while maintaining high practicality of the embedding. - This attack framework gradually filters and processes embedding vectors that may contain watermarks through three steps: clustering, selection, and elimination. 2. **Design the WARDEN defense mechanism**: - WARDEN is a multi - directional watermark enhancement mechanism. By introducing multiple watermark vectors, it increases the difficulty for attackers to remove all watermarks. - WARDEN not only improves the concealment of watermarks but also shows effective defense capabilities against CSE attacks on various datasets. - WARDEN also designs a verification protocol that allows each watermark to independently verify copyright infringement. #### Formula summary - **Distance difference calculation in CSE attack**: \[ D_p=\text{Rank}(D_v)-\text{Rank}(D_s) \] where \(D_v\) and \(D_s\) are the cosine similarity differences of the victim model and the standard model respectively. - **Multi - directional watermark generation formula in WARDEN**: \[ \text{Norm}\left((1 - \sum_{r = 1}^R\lambda_r(S))\cdot e_o+\sum_{r = 1}^R\lambda_r(S)\cdot w_r\right) \] where \(e_o\) is the original embedding vector, \(w_r\) is the \(r\) - th watermark vector, and \(\lambda_r(S)\) is the trigger word frequency function. Through these methods, the paper successfully solves the problem of model extraction attacks in EaaS and provides a more powerful copyright protection mechanism for future embedded services.

WARDEN: Multi-Directional Backdoor Watermarks for Embedding-as-a-Service Copyright Protection

Warfare:Breaking the Watermark Protection of AI-Generated Content

ESpeW: Robust Copyright Protection for LLM-based EaaS via Embedding-Specific Watermark

Are You Copying My Model? Protecting the Copyright of Large Language Models for EaaS via Backdoor Watermark

Robust Blind Video Watermarking with Adaptive Embedding Mechanism

MEA-Defender: A Robust Watermark against Model Extraction Attack

Investigating Deep Watermark Security: An Adversarial Transferability Perspective

WaterPark: A Robustness Assessment of Language Model Watermarking

On the Weaknesses of Backdoor-based Model Watermarking: An Information-theoretic Perspective

Reliable Model Watermarking: Defending Against Theft without Compromising on Evasion

Deep Neural Network Watermarking Against Model Extraction Attack

Watermarking in Secure Federated Learning: A Verification Framework Based on Client-Side Backdooring

Watermark Stealing in Large Language Models

CATER: Intellectual Property Protection on Text Generation APIs via Conditional Watermarks

Spy-Watermark: Robust Invisible Watermarking for Backdoor Attack

WaterPool: A Watermark Mitigating Trade-offs among Imperceptibility, Efficacy and Robustness

Invisible Adversarial Watermarking: A Novel Security Mechanism for Enhancing Copyright Protection

Dual Defense: Adversarial, Traceable, and Invisible Robust Watermarking Against Face Swapping

Persistent and Unforgeable Watermarks for Deep Neural Networks.

E-SAWM: A Semantic Analysis-Based ODF Watermarking Algorithm for Edge Cloud Scenarios

Generating Image Adversarial Examples by Embedding Digital Watermarks