Not Just Change the Labels, Learn the Features: Watermarking Deep Neural Networks with Multi-View Data

Yuxuan Li,Sarthak Kumar Maharana,Yunhui Guo
2024-07-19
Abstract:With the increasing prevalence of Machine Learning as a Service (MLaaS) platforms, there is a growing focus on deep neural network (DNN) watermarking techniques. These methods are used to facilitate the verification of ownership for a target DNN model to protect intellectual property. One of the most widely employed watermarking techniques involves embedding a trigger set into the source model. Unfortunately, existing methodologies based on trigger sets are still susceptible to functionality-stealing attacks, potentially enabling adversaries to steal the functionality of the source model without a reliable means of verifying ownership. In this paper, we first introduce a novel perspective on trigger set-based watermarking methods from a feature learning perspective. Specifically, we demonstrate that by selecting data exhibiting multiple features, also referred to as \emph{multi-view data}, it becomes feasible to effectively defend functionality stealing attacks. Based on this perspective, we introduce a novel watermarking technique based on Multi-view dATa, called MAT, for efficiently embedding watermarks within DNNs. This approach involves constructing a trigger set with multi-view data and incorporating a simple feature-based regularization method for training the source model. We validate our method across various benchmarks and demonstrate its efficacy in defending against model extraction attacks, surpassing relevant baselines by a significant margin. The code is available at: \href{<a class="link-external link-https" href="https://github.com/liyuxuan-github/MAT" rel="external noopener nofollow">this https URL</a>}{<a class="link-external link-https" href="https://github.com/liyuxuan-github/MAT" rel="external noopener nofollow">this https URL</a>}.
Cryptography and Security,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The problem this paper attempts to address is: How to effectively protect the intellectual property of deep neural network (DNN) models in Machine Learning as a Service (MLaaS) platforms and prevent functionality-stealing attacks. Specifically, existing watermarking techniques based on trigger sets can be used to verify model ownership, but they are still easily removed under functionality-stealing attacks. Therefore, this paper proposes a new watermarking method based on multi-view data to improve the robustness of model watermarks under functionality-stealing attacks. ### Main Issues: 1. **Defense against functionality-stealing attacks**: Existing trigger set-based watermarking methods are easily removed under functionality-stealing attacks, making it impossible to effectively verify model ownership. 2. **Improving watermark robustness**: A new method is needed to enhance the robustness of watermarks so that they can reliably identify model ownership even under functionality-stealing attacks. ### Solution: 1. **Introduction of multi-view data**: By selecting data with multiple features (multi-view data), it becomes difficult for attackers to remove the watermark during functionality-stealing. 2. **Feature regularization loss**: Introducing feature regularization loss during training encourages the model to learn multi-view features in the specific trigger set, thereby improving the robustness of the watermark. ### Specific Method: - **Selection of multi-view data**: Using logit margin loss to select samples with multi-view features as the trigger set. - **Feature regularization**: Minimizing feature regularization loss to make the features of trigger set samples closer to the modified label class center. ### Experimental Results: - **Experiments on CIFAR-10, CIFAR-100, and ImageNet datasets**: The results show that the proposed MAT method outperforms baseline methods under both soft label and hard label model extraction attacks. Especially under hard label model extraction attacks, MAT still maintains high trigger set accuracy. ### Summary: This paper proposes a novel watermarking method (MAT) based on multi-view data, which effectively improves the ability to verify the ownership of deep neural networks under functionality-stealing attacks, addressing the issue of existing methods being easily removed under such attacks.