Watermarking Recommender Systems

Sixiao Zhang,Cheng Long,Wei Yuan,Hongxu Chen,Hongzhi Yin
DOI: https://doi.org/10.1145/3627673.3679617
2024-09-30
Abstract:Recommender systems embody significant commercial value and represent crucial intellectual property. However, the integrity of these systems is constantly challenged by malicious actors seeking to steal their underlying models. Safeguarding against such threats is paramount to upholding the rights and interests of the model owner. While model watermarking has emerged as a potent defense mechanism in various domains, its direct application to recommender systems remains unexplored and non-trivial. In this paper, we address this gap by introducing Autoregressive Out-of-distribution Watermarking (AOW), a novel technique tailored specifically for recommender systems. Our approach entails selecting an initial item and querying it through the oracle model, followed by the selection of subsequent items with small prediction scores. This iterative process generates a watermark sequence autoregressively, which is then ingrained into the model's memory through training. To assess the efficacy of the watermark, the model is tasked with predicting the subsequent item given a truncated watermark sequence. Through extensive experimentation and analysis, we demonstrate the superior performance and robust properties of AOW. Notably, our watermarking technique exhibits high-confidence extraction capabilities and maintains effectiveness even in the face of distillation and fine-tuning processes.
Information Retrieval,Cryptography and Security,Machine Learning
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper attempts to address the issues of model theft and leakage in recommendation systems. Specifically, the authors focus on how to protect the intellectual property of recommendation systems and prevent malicious actors from stealing their underlying models. Although model watermarking techniques have been extensively studied in other fields (such as computer vision), directly applying them to recommendation systems remains an unexplored and non-trivial task. Therefore, this paper proposes Autoregressive Out-of-distribution Watermarking (AOW), a new technique specifically designed for recommendation systems. ### Solution 1. **Problem Definition**: - Given a set of users \( U \) and a set of items \( I \), each user is associated with a series of interacted items \( S_u = \{i_u^1, i_u^2, \ldots\} \). - Use these sequences to train a recommendation model \( f \), referred to as the oracle model. - The goal is to design an additional sequence \( S_{wm} \) as a watermark, and train a new watermark model \( f_{wm} \) with the original dataset \( S \) and the watermark sequence \( S_{wm} \), so that it can remember the watermark sequence \( S_{wm} \) while maintaining good recommendation performance. 2. **Challenges**: - **Model Utility**: The performance of the model should be minimally affected after watermark injection. - **Watermark Effectiveness**: The confidence of the watermark in the watermark model should be high, while it should be low in non-watermarked models. - **Robustness**: The watermark should resist removal attacks such as distillation and fine-tuning. 3. **Solution**: - **Black-box vs. White-box**: Choose black-box watermarking because it is not always possible to access the parameters of the suspicious model. - **Out-of-distribution vs. In-distribution**: Choose out-of-distribution watermarking because in-distribution watermarking would reduce model utility. - **Choice of Watermark Pattern**: Do not use fake items, but use existing items to form a special input-output mapping. - **AOW Method**: Generate the entire watermark sequence \( S_{wm} = \{i_{wm}^1, i_{wm}^2, \ldots, i_{wm}^n\} \) through an autoregressive method. The specific steps are as follows: 1. Train an oracle model from the original dataset. 2. Select an initial item \( i_{wm}^1 \). 3. Query the oracle model with this item to get the prediction scores for all items. 4. Select one of the lowest-scoring items as the next watermark item \( i_{wm}^2 \). 5. Repeat the above process until the watermark sequence reaches the preset length \( n \). 6. Train a new watermark model \( f_{wm} \) with the watermark sequence and the original dataset. ### Experimental Results 1. **Watermark Effectiveness and Model Utility**: - The watermark achieves 100% Recall@1 on all datasets, indicating that the watermark can be effectively retained by the target model. - AOW significantly outperforms the GRO method in protecting model utility. 2. **Robustness**: - The watermark shows high robustness after model distillation and fine-tuning. 3. **Hyperparameter Study**: - A detailed analysis of the impact of hyperparameters such as watermark sequence length, initial item selection, and the ratio of watermark to data on performance. Through these experiments, the authors demonstrate the effectiveness and robustness of the AOW method, providing a new solution for the intellectual property protection of recommendation systems.