Abstract:Deep neural networks (DNNs) are widely used in real-world applications, thanks to their exceptional performance in image recognition. However, their vulnerability to attacks, such as Trojan and data poison, can compromise the integrity and stability of DNN applications. Therefore, it is crucial to verify the integrity of DNN models to ensure their security. Previous research on model watermarking for integrity detection has encountered the issue of overexposure of model parameters during embedding and extraction of the watermark. To address this problem, we propose a novel score-based black-box DNN fragile watermarking framework called fragile trigger generation (FTG). The FTG framework only requires the prediction probability distribution of the final output of the classifier during the watermarking process. It generates different fragile samples as the trigger, based on the classification prediction probability of the target classifier and a specified prediction probability mask to watermark it. Different prediction probability masks can promote the generation of fragile samples in corresponding distribution types. The whole watermarking process does not affect the performance of the target classifier. When verifying the watermarking information, the FTG only needs to compare the prediction results of the model on the samples with the previous label. As a result, the required model parameter information is reduced, and the FTG only needs a few samples to detect slight modifications in the model. Experimental results demonstrate the effectiveness of our proposed method and show its superiority over related work. The FTG framework provides a robust solution for verifying the integrity of DNN models, and its effectiveness in detecting slight modifications makes it a valuable tool for ensuring the security and stability of DNN applications.

Semi-fragile Neural Network Watermarking for Content Authentication and Tampering Localization

Semi-Fragile Neural Network Watermarking Based on Adversarial Examples

Leveraging Unlabeled Data for Watermark Removal of Deep Neural Networks

Fragile Model Watermark for integrity protection: leveraging boundary volatility and sensitive sample-pairing

Neural network fragile watermarking with no model performance degradation

Fragile Neural Network Watermarking with Trigger Image Set

A Survey of Fragile Model Watermarking

Adaptive White-Box Watermarking with Self-Mutual Check Parameters in Deep Neural Networks

Adaptive watermarking with self-mutual check parameters in deep neural networks

FTG: Score-based Black-Box Watermarking by Fragile Trigger Generation for Deep Model Integrity Verification

Deep Neural Network Watermarking Against Model Extraction Attack

Decision-based iterative fragile watermarking for model integrity verification

Towards Robust Model Watermark Via Reducing Parametric Vulnerability

Making Watermark Survive Model Extraction Attacks in Graph Neural Networks.

Verifying Integrity of Deep Ensemble Models by Lossless Black-box Watermarking with Sensitive Samples

Reversible Watermarking in Deep Convolutional Neural Networks for Integrity Authentication

Effectiveness of Distillation Attack and Countermeasure on Neural Network Watermarking

Watermarking in Deep Neural Networks Via Error Back-propagation

DeepiSign: Invisible Fragile Watermark to Protect the Integrityand Authenticity of CNN

Persistent and Unforgeable Watermarks for Deep Neural Networks.

Certified Neural Network Watermarks with Randomized Smoothing