Abstract:Neural networks have increasingly influenced people's lives. Ensuring the faithful deployment of neural networks as designed by their model owners is crucial, as they may be susceptible to various malicious or unintentional modifications, such as backdooring and poisoning attacks. Fragile model watermarks aim to prevent unexpected tampering that could lead DNN models to make incorrect decisions. They ensure the detection of any tampering with the model as sensitively as possible.However, prior watermarking methods suffered from inefficient sample generation and insufficient sensitivity, limiting their practical applicability. Our approach employs a sample-pairing technique, placing the model boundaries between pairs of samples, while simultaneously maximizing logits. This ensures that the model's decision results of sensitive samples change as much as possible and the Top-1 labels easily alter regardless of the direction it moves.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to protect the integrity of deep neural networks (DNNs) and prevent them from being maliciously or unintentionally modified after deployment. These modifications may lead to the model making wrong decisions. Specifically, the paper focuses on how to generate highly - sensitive sample pairs to detect changes in the model boundaries, thereby achieving effective monitoring of the model integrity. The existing watermarking methods have problems of low sample generation efficiency and insufficient sensitivity, which limit their practical applications. Therefore, this paper proposes a new method. By analyzing the model boundary characteristics and introducing a loss function to approach the most volatile boundary areas, and at the same time using the sample pairing technique to place the model boundary between the sample pairs and maximize the output logits, it ensures that the model decision results change as much as possible for sensitive samples and that the Top - 1 label is easy to change regardless of the movement direction. Experimental evaluations show that this method exhibits higher sensitivity and generation efficiency than existing methods on multiple models and datasets. ### Main contributions: 1. **Enhanced sensitivity**: By analyzing the characteristics of the model boundary, a loss function is introduced to approach the most volatile boundary areas. 2. **Further enhanced sensitivity**: A two - stage sample generation process is adopted to generate sample pairs so that the model boundary is sandwiched between the sample pairs. 3. **Improved efficiency**: An additional binary classification layer is introduced to solve the problem of the average output vector in the multi - classification scenario and reduce the consumption of computational resources. ### Method overview: - **Adding an additional binary classification layer**: By recording the user's key, a linear layer is generated to simplify the multi - classification problem into a binary classification problem, reducing the difficulty of generating the average classification result. - **Training the loss function**: By maximizing the activation value of the final layer, the samples are located in areas of the model space that are particularly sensitive to changes. - **Approaching the model boundary**: The gradient ascent method is used to control the distance between the samples and the model boundary, ensuring that the samples are near the model boundary and highly activated. ### Experimental results: - **Sensitivity**: On multiple models and datasets, this method shows significantly higher sensitivity than other methods when detecting changes after model backdoor attacks, fine - tuning, pruning, and quantization. - **Generation efficiency**: In the process of generating sensitive samples, the iteration speed of this method is close to that of the method that only averages the output results, and it still maintains a certain robustness at a low learning rate. ### Conclusion: The method proposed in this paper can not only generate sensitive samples efficiently but also show high sensitivity when detecting small changes in the model, providing an effective solution for the integrity protection of deep neural networks.

Fragile Model Watermark for integrity protection: leveraging boundary volatility and sensitive sample-pairing

Leveraging Unlabeled Data for Watermark Removal of Deep Neural Networks

Semi-Fragile Neural Network Watermarking Based on Adversarial Examples

Fragile Neural Network Watermarking with Trigger Image Set

FTG: Score-based Black-Box Watermarking by Fragile Trigger Generation for Deep Model Integrity Verification

Semi-fragile Neural Network Watermarking for Content Authentication and Tampering Localization

A Survey of Fragile Model Watermarking

Neural network fragile watermarking with no model performance degradation

Convolutional Neural Networks Tamper Detection and Location Based on Fragile Watermarking

Towards Robust Model Watermark Via Reducing Parametric Vulnerability

Making Watermark Survive Model Extraction Attacks in Graph Neural Networks.

Decision-based iterative fragile watermarking for model integrity verification

High-Quality Triggers Based Fragile Watermarking for Optical Character Recognition Model

Deep Neural Network Watermarking Against Model Extraction Attack

Probabilistically Robust Watermarking of Neural Networks

Adaptive White-Box Watermarking with Self-Mutual Check Parameters in Deep Neural Networks

Adaptive watermarking with self-mutual check parameters in deep neural networks

Verifying Integrity of Deep Ensemble Models by Lossless Black-box Watermarking with Sensitive Samples

Reliable Model Watermarking: Defending Against Theft without Compromising on Evasion

Protecting IP of Deep Neural Networks with Watermarking Using Logistic Disorder Generation Trigger Sets

Rethinking White-BoxWatermarks on Deep Learning Models under Neural Structural Obfuscation