Fragile Model Watermark for integrity protection: leveraging boundary volatility and sensitive sample-pairing

ZhenZhe Gao,Zhenjun Tang,Zhaoxia Yin,Baoyuan Wu,Yue Lu
2024-06-13
Abstract:Neural networks have increasingly influenced people's lives. Ensuring the faithful deployment of neural networks as designed by their model owners is crucial, as they may be susceptible to various malicious or unintentional modifications, such as backdooring and poisoning attacks. Fragile model watermarks aim to prevent unexpected tampering that could lead DNN models to make incorrect decisions. They ensure the detection of any tampering with the model as sensitively as possible.However, prior watermarking methods suffered from inefficient sample generation and insufficient sensitivity, limiting their practical applicability. Our approach employs a sample-pairing technique, placing the model boundaries between pairs of samples, while simultaneously maximizing logits. This ensures that the model's decision results of sensitive samples change as much as possible and the Top-1 labels easily alter regardless of the direction it moves.
Cryptography and Security,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to protect the integrity of deep neural networks (DNNs) and prevent them from being maliciously or unintentionally modified after deployment. These modifications may lead to the model making wrong decisions. Specifically, the paper focuses on how to generate highly - sensitive sample pairs to detect changes in the model boundaries, thereby achieving effective monitoring of the model integrity. The existing watermarking methods have problems of low sample generation efficiency and insufficient sensitivity, which limit their practical applications. Therefore, this paper proposes a new method. By analyzing the model boundary characteristics and introducing a loss function to approach the most volatile boundary areas, and at the same time using the sample pairing technique to place the model boundary between the sample pairs and maximize the output logits, it ensures that the model decision results change as much as possible for sensitive samples and that the Top - 1 label is easy to change regardless of the movement direction. Experimental evaluations show that this method exhibits higher sensitivity and generation efficiency than existing methods on multiple models and datasets. ### Main contributions: 1. **Enhanced sensitivity**: By analyzing the characteristics of the model boundary, a loss function is introduced to approach the most volatile boundary areas. 2. **Further enhanced sensitivity**: A two - stage sample generation process is adopted to generate sample pairs so that the model boundary is sandwiched between the sample pairs. 3. **Improved efficiency**: An additional binary classification layer is introduced to solve the problem of the average output vector in the multi - classification scenario and reduce the consumption of computational resources. ### Method overview: - **Adding an additional binary classification layer**: By recording the user's key, a linear layer is generated to simplify the multi - classification problem into a binary classification problem, reducing the difficulty of generating the average classification result. - **Training the loss function**: By maximizing the activation value of the final layer, the samples are located in areas of the model space that are particularly sensitive to changes. - **Approaching the model boundary**: The gradient ascent method is used to control the distance between the samples and the model boundary, ensuring that the samples are near the model boundary and highly activated. ### Experimental results: - **Sensitivity**: On multiple models and datasets, this method shows significantly higher sensitivity than other methods when detecting changes after model backdoor attacks, fine - tuning, pruning, and quantization. - **Generation efficiency**: In the process of generating sensitive samples, the iteration speed of this method is close to that of the method that only averages the output results, and it still maintains a certain robustness at a low learning rate. ### Conclusion: The method proposed in this paper can not only generate sensitive samples efficiently but also show high sensitivity when detecting small changes in the model, providing an effective solution for the integrity protection of deep neural networks.