Abstract:Training deep neural networks from scratch could be computationally expensive and requires a lot of training data. Recent work has explored different watermarking techniques to protect the pre-trained deep neural networks from potential copyright infringements. However, these techniques could be vulnerable to watermark removal attacks. In this work, we propose REFIT, a unified watermark removal framework based on fine-tuning, which does not rely on the knowledge of the watermarks, and is effective against a wide range of watermarking schemes. In particular, we conduct a comprehensive study of a realistic attack scenario where the adversary has limited training data, which has not been emphasized in prior work on attacks against watermarking schemes. To effectively remove the watermarks without compromising the model functionality under this weak threat model, we propose two techniques that are incorporated into our fine-tuning framework: (1) an adaption of the elastic weight consolidation (EWC) algorithm, which is originally proposed for mitigating the catastrophic forgetting phenomenon; and (2) unlabeled data augmentation (AU), where we leverage auxiliary unlabeled data from other sources. Our extensive evaluation shows the effectiveness of REFIT against diverse watermark embedding schemes. In particular, both EWC and AU significantly decrease the amount of labeled training data needed for effective watermark removal, and the unlabeled data samples used for AU do not necessarily need to be drawn from the same distribution as the benign data for model evaluation. The experimental results demonstrate that our fine-tuning based watermark removal attacks could pose real threats to the copyright of pre-trained models, and thus highlight the importance of further investigating the watermarking problem and proposing more robust watermark embedding schemes against the attacks.

Leveraging Unlabeled Data for Watermark Removal of Deep Neural Networks

REFIT: A UnifiedWatermark Removal Framework for Deep Learning Systems with Limited Data

On Function-Coupled Watermarks for Deep Neural Networks

Removing Backdoor-Based Watermarks in Neural Networks with Limited Data.

REFIT: A Unified Watermark Removal Framework For Deep Learning Systems With Limited Data

Embedding Watermarks into Deep Neural Networks

Neural Dehydration: Effective Erasure of Black-box Watermarks from DNNs with Limited Data

Making Watermark Survive Model Extraction Attacks in Graph Neural Networks.

Deep Neural Network Watermarking Against Model Extraction Attack

Digital watermarking for deep neural networks

Free Fine-tuning: A Plug-and-Play Watermarking Scheme for Deep Neural Networks

Structural Watermarking to Deep Neural Networks Via Network Channel Pruning

Watermarking in Deep Neural Networks Via Error Back-propagation

Watermarking Neural Networks with Watermarked Images

On the Robustness of the Backdoor-based Watermarking in Deep Neural Networks

Rethinking White-BoxWatermarks on Deep Learning Models under Neural Structural Obfuscation

Deep Model Intellectual Property Protection Via Deep Watermarking

Subnetwork-Lossless Robust Watermarking for Hostile Theft Attacks in Deep Transfer Learning Models

Persistent and Unforgeable Watermarks for Deep Neural Networks.

Deep neural networks watermark via universal deep hiding and metric learning

FreeMark: A Non-Invasive White-Box Watermarking for Deep Neural Networks