Abstract:Recommender systems embody significant commercial value and represent crucial intellectual property. However, the integrity of these systems is constantly challenged by malicious actors seeking to steal their underlying models. Safeguarding against such threats is paramount to upholding the rights and interests of the model owner. While model watermarking has emerged as a potent defense mechanism in various domains, its direct application to recommender systems remains unexplored and non-trivial. In this paper, we address this gap by introducing Autoregressive Out-of-distribution Watermarking (AOW), a novel technique tailored specifically for recommender systems. Our approach entails selecting an initial item and querying it through the oracle model, followed by the selection of subsequent items with small prediction scores. This iterative process generates a watermark sequence autoregressively, which is then ingrained into the model's memory through training. To assess the efficacy of the watermark, the model is tasked with predicting the subsequent item given a truncated watermark sequence. Through extensive experimentation and analysis, we demonstrate the superior performance and robust properties of AOW. Notably, our watermarking technique exhibits high-confidence extraction capabilities and maintains effectiveness even in the face of distillation and fine-tuning processes.

What problem does this paper attempt to address?

### Problems Addressed by the Paper The paper attempts to address the issues of model theft and leakage in recommendation systems. Specifically, the authors focus on how to protect the intellectual property of recommendation systems and prevent malicious actors from stealing their underlying models. Although model watermarking techniques have been extensively studied in other fields (such as computer vision), directly applying them to recommendation systems remains an unexplored and non-trivial task. Therefore, this paper proposes Autoregressive Out-of-distribution Watermarking (AOW), a new technique specifically designed for recommendation systems. ### Solution 1. **Problem Definition**: - Given a set of users \( U \) and a set of items \( I \), each user is associated with a series of interacted items \( S_u = \{i_u^1, i_u^2, \ldots\} \). - Use these sequences to train a recommendation model \( f \), referred to as the oracle model. - The goal is to design an additional sequence \( S_{wm} \) as a watermark, and train a new watermark model \( f_{wm} \) with the original dataset \( S \) and the watermark sequence \( S_{wm} \), so that it can remember the watermark sequence \( S_{wm} \) while maintaining good recommendation performance. 2. **Challenges**: - **Model Utility**: The performance of the model should be minimally affected after watermark injection. - **Watermark Effectiveness**: The confidence of the watermark in the watermark model should be high, while it should be low in non-watermarked models. - **Robustness**: The watermark should resist removal attacks such as distillation and fine-tuning. 3. **Solution**: - **Black-box vs. White-box**: Choose black-box watermarking because it is not always possible to access the parameters of the suspicious model. - **Out-of-distribution vs. In-distribution**: Choose out-of-distribution watermarking because in-distribution watermarking would reduce model utility. - **Choice of Watermark Pattern**: Do not use fake items, but use existing items to form a special input-output mapping. - **AOW Method**: Generate the entire watermark sequence \( S_{wm} = \{i_{wm}^1, i_{wm}^2, \ldots, i_{wm}^n\} \) through an autoregressive method. The specific steps are as follows: 1. Train an oracle model from the original dataset. 2. Select an initial item \( i_{wm}^1 \). 3. Query the oracle model with this item to get the prediction scores for all items. 4. Select one of the lowest-scoring items as the next watermark item \( i_{wm}^2 \). 5. Repeat the above process until the watermark sequence reaches the preset length \( n \). 6. Train a new watermark model \( f_{wm} \) with the watermark sequence and the original dataset. ### Experimental Results 1. **Watermark Effectiveness and Model Utility**: - The watermark achieves 100% Recall@1 on all datasets, indicating that the watermark can be effectively retained by the target model. - AOW significantly outperforms the GRO method in protecting model utility. 2. **Robustness**: - The watermark shows high robustness after model distillation and fine-tuning. 3. **Hyperparameter Study**: - A detailed analysis of the impact of hyperparameters such as watermark sequence length, initial item selection, and the ratio of watermark to data on performance. Through these experiments, the authors demonstrate the effectiveness and robustness of the AOW method, providing a new solution for the intellectual property protection of recommendation systems.

Watermarking Recommender Systems

REFIT: A UnifiedWatermark Removal Framework for Deep Learning Systems with Limited Data

Data Watermarking for Sequential Recommender Systems

Warfare:Breaking the Watermark Protection of AI-Generated Content

On the Weaknesses of Backdoor-based Model Watermarking: An Information-theoretic Perspective

Reliable Model Watermarking: Defending Against Theft without Compromising on Evasion

Seeds Don't Lie: An Adaptive Watermarking Framework for Computer Vision Models

Watermarking Language Models for Many Adaptive Users

Cosine Model Watermarking Against Ensemble Distillation

A Resilient and Accessible Distribution-Preserving Watermark for Large Language Models

ModelShield: Adaptive and Robust Watermark against Model Extraction Attack

A Robust Image Watermarking Algorithm Based on Content Authentication and Intelligent Optimization

A Watermark-Conditioned Diffusion Model for IP Protection

WaterPark: A Robustness Assessment of Language Model Watermarking

Towards Robust Model Watermark Via Reducing Parametric Vulnerability

WaterPool: A Watermark Mitigating Trade-offs among Imperceptibility, Efficacy and Robustness

Explanation as a Watermark: Towards Harmless and Multi-bit Model Ownership Verification via Watermarking Feature Attribution

A Certified Robust Watermark For Large Language Models

On the Reliability of Watermarks for Large Language Models

Suppressing High-Frequency Artifacts for Generative Model Watermarking by Anti-Aliasing

A Survey of Fragile Model Watermarking