Fragile Neural Network Watermarking with Trigger Image Set

Renjie Zhu,Ping Wei,Sheng Li,Zhaoxia Yin,Xinpeng Zhang,Zhenxing Qian
DOI: https://doi.org/10.1007/978-3-030-82136-4_23
2021-01-01
Abstract:Recent studies show that deep neural networks are vulnerable to data poisoning and backdoor attacks, both of which involve malicious fine tuning of deep models. In this paper, we first propose a blackbox based fragile neural network watermarking method for the detection of malicious fine tuning. The watermarking process can be divided into three steps. Firstly, a set of trigger images is constructed based on a user-specific secret key. Then, a well trained DNN model is fine-tuned to classify the normal images in training set and trigger images in trigger set simultaneously in a two-stage alternate training manner. Fragile watermark is embedded by this means while keeping model's original classification ability. The watermarked model is sensitive to malicious fine tuning and will produce unstable classification results of the trigger images. At last, the integrity of the network model can be verified by analyzing the output of watermarked model with the trigger image set as input. The experiments on three benchmark datasets demonstrate that our proposed watermarking method is effective in detecting malicious fine tuning.
What problem does this paper attempt to address?