Abstract:Deep neural networks (DNNs) are widely used in real-world applications, thanks to their exceptional performance in image recognition. However, their vulnerability to attacks, such as Trojan and data poison, can compromise the integrity and stability of DNN applications. Therefore, it is crucial to verify the integrity of DNN models to ensure their security. Previous research on model watermarking for integrity detection has encountered the issue of overexposure of model parameters during embedding and extraction of the watermark. To address this problem, we propose a novel score-based black-box DNN fragile watermarking framework called fragile trigger generation (FTG). The FTG framework only requires the prediction probability distribution of the final output of the classifier during the watermarking process. It generates different fragile samples as the trigger, based on the classification prediction probability of the target classifier and a specified prediction probability mask to watermark it. Different prediction probability masks can promote the generation of fragile samples in corresponding distribution types. The whole watermarking process does not affect the performance of the target classifier. When verifying the watermarking information, the FTG only needs to compare the prediction results of the model on the samples with the previous label. As a result, the required model parameter information is reduced, and the FTG only needs a few samples to detect slight modifications in the model. Experimental results demonstrate the effectiveness of our proposed method and show its superiority over related work. The FTG framework provides a robust solution for verifying the integrity of DNN models, and its effectiveness in detecting slight modifications makes it a valuable tool for ensuring the security and stability of DNN applications.

Semi-Fragile Neural Network Watermarking Based on Adversarial Examples

Leveraging Unlabeled Data for Watermark Removal of Deep Neural Networks

Semi-fragile Neural Network Watermarking for Content Authentication and Tampering Localization

Fragile Model Watermark for integrity protection: leveraging boundary volatility and sensitive sample-pairing

Fragile Neural Network Watermarking with Trigger Image Set

Neural network fragile watermarking with no model performance degradation

Deep Neural Network Watermarking Against Model Extraction Attack

Convolutional Neural Networks Tamper Detection and Location Based on Fragile Watermarking

FTG: Score-based Black-Box Watermarking by Fragile Trigger Generation for Deep Model Integrity Verification

Verifying Integrity of Deep Ensemble Models by Lossless Black-box Watermarking with Sensitive Samples

Making Watermark Survive Model Extraction Attacks in Graph Neural Networks.

A Survey of Fragile Model Watermarking

Rethinking the Vulnerability of DNN Watermarking: Are Watermarks Robust against Naturalness-aware Perturbations?

Adaptive White-Box Watermarking with Self-Mutual Check Parameters in Deep Neural Networks

Generating Image Adversarial Examples by Embedding Digital Watermarks

Rethinking White-BoxWatermarks on Deep Learning Models under Neural Structural Obfuscation

Fused Pruning based Robust Deep Neural Network Watermark Embedding

Protecting the Intellectual Property of Deep Neural Networks with Watermarking: The Frequency Domain Approach

Rethinking the Vulnerability of DNN Watermarking

Adaptive watermarking with self-mutual check parameters in deep neural networks

Watermarking in Deep Neural Networks Via Error Back-propagation